sumapache-pig

Error 1045 on sum function in pig latin with an int


The following pig latin script:

data = load 'access_log_Jul95' using PigStorage(' ') as (ip:chararray, dash1:chararray, dash2:chararray, date:chararray, date1:chararray, getRequset:chararray, location:chararray, http:chararray, code:int, size:int);

splitDate = foreach data generate  size as size:int , ip as ip,  FLATTEN(STRSPLIT(date, ':')) as h;

groupedIp = group splitDate by h.$1;

a = foreach groupedIp{
    added = foreach splitDate generate SUM(size); --
    generate added;
};


describe a;

gives me the error:

ERROR 1045: <file 3.pig, line 10, column 39> Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.

This error makes me think I need to cast size as an int, but if i describe my groupedIp field, I get the following schema.

groupedIp: {group: bytearray,splitDate: {(size: int,ip: chararray,h: bytearray)}} which indicates that size is an int, and should be able to be used by the sum function.

Am I calling the sum function incorrectly? Let me know if you would like to see any thing else, such as the input file.


Solution

  • SUM operates on a bag as input, but you pass it the field 'size'.
    Try to eliminate the nested foreach and use:

    a = foreach groupedIp generate SUM(splitDate.size);