hadoopapache-pighcatalogbigdata

Sum of unnamed column in pig


 shipnode,delivery_method ,<unnamed>
 (9935,PICK,2)
 (9960,PICK,2)
 (9969,PICK,1)
 (9963,SHP,1)
 (9989,SHP,1)
 (9995,SHP,1)
 (9965,SHP,1)
 (9995,SHP,1)

this is the output of

 grunt> group_all_shipnode = GROUP
 >> union_all 
 >> BY(
 >> shipnode,delivery_method
 >> )
 >> ;

the last column is unnamed , now i want to generate as the grouping by shipnode and delivery_node and taking sum of the third column as

 (9935,PICK,2)
 (9960,PICK,2)
 (9969,PICK,1)
 (9963,SHP,1)
 (9989,SHP,1)
 (9995,SHP,2) <<-------      sum of similar 
 (9965,SHP,1)

i am trying by this :

 grunt> sum_group_all_shipnode =FOREACH group_all_shipnode 
 >> GENERATE FLATTEN(group) as(shipnode:chararray, delivery_method:chararray),
 >> sum($1.$2);

which produce error:

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve sum using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]

Solution

  • Instead of $1.$2 it needs to be the relation from your load statement. For example assuming you are loading the data to a relation A.

    A = LOAD 'data.csv' USING PigStorage(',');
    group_all_shipnode = GROUP A BY ($1,$2);
    sum_group_all_shipnode = FOREACH group_all_shipnode 
                             GENERATE 
                                 FLATTEN(group) AS (shipnode:chararray, delivery_method:chararray),
                                 SUM(A.$2);