hadoopapache-pighadoop-plugins

ERROR 1128: Cannot find field dryTemp


my pig was run code temperature and me an error, put the code below and the error to facilitate the understanding of my problem occurred.

the error is in line 38 column 15, tried to delete the dryTemp, but also gave another error.

Code:

 --Load files into relations
    month1 = LOAD 'hdfs:/data/big/data/weather/weather/201201hourly.txt' USING PigStorage(',');
    month2 = LOAD 'hdfs:/data/big/data/weather/weather/201202hourly.txt' USING PigStorage(',');
    month3 = LOAD 'hdfs:/data/big/data/weather/weather/201203hourly.txt' USING PigStorage(',');
    month4 = LOAD 'hdfs:/data/big/data/weather/weather/201204hourly.txt' USING PigStorage(',');
    month5 = LOAD 'hdfs:/data/big/data/weather/weather/201205hourly.txt' USING PigStorage(',');
    month6 = LOAD 'hdfs:/data/big/data/weather/weather/201206hourly.txt' USING PigStorage(',');

    --Combine relations
    months = UNION month1, month2, month3, month4, month5, month6;

    /* Splitting relations
    SPLIT months INTO 
            splitMonth1 IF SUBSTRING(date, 4, 6) == '01',
            splitMonth2 IF SUBSTRING(date, 4, 6) == '02',
            splitMonth3 IF SUBSTRING(date, 4, 6) == '03',
            splitRest IF (SUBSTRING(date, 4, 6) == '04' OR SUBSTRING(date, 4, 6) == '04');
    */

    /*  Joining relations

    stations = LOAD 'hdfs:/data/big/data/QCLCD201211/stations.txt' USING PigStorage() AS (id:int, name:chararray)

    JOIN months BY wban, stations by id;

    */

    --filter out unwanted data
    clearWeather = FILTER months BY skyCondition == 'CLR';

    --Transform and shape relation
    shapedWeather = FOREACH clearWeather GENERATE date, SUBSTRING(date, 0, 4) as year, SUBSTRING(date, 4, 6) as month, SUBSTRING(date, 6, 8) as day, skyCondition, dryTemp;

    --Group relation specifying number of reducers
    groupedByMonthDay = GROUP shapedWeather BY (month, day) PARALLEL 10;

    --Aggregate relation
    aggedResults = FOREACH groupedByMonthDay GENERATE group as MonthDay, AVG(shapedWeather.dryTemp), MIN(shapedWeather.dryTemp), MAX(shapedWeather.dryTemp), COUNT(shapedWeather.dryTemp) PARALLEL 10;

    --Sort relation
    sortedResults = ORDER aggedResults BY $1 DESC;

    --Store results in HDFS
    STORE sortedResults INTO 'hdfs:/data/big/data/weather/pigresults' USING PigStorage(':');

Put down the error, he was kinda big, still do not know much about the pig, I'm still studying, I believe that error has to do with the type of variable that is not recognized but do not know fix it hopefully help me.

Error:

ERROR 1128: Cannot find field dryTemp in :bytearray,year:chararray,month:chararray,day:chararray,:bytearray,:bytearray

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Cannot find field dryTemp in :bytearray,year:chararray,month:chararray,day:chararray,:bytearray,:bytearray
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1691)
    at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1411)
    at org.apache.pig.PigServer.parseAndBuild(PigServer.java:344)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:369)
    at org.apache.pig.PigServer.executeBatch(PigServer.java:355)
    at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202)
    at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
    at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
    at org.apache.pig.Main.run(Main.java:607)
    at org.apache.pig.Main.main(Main.java:156)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: Failed to parse: Pig script failed to parse: 
<file Documentos/pig/weather.pig, line 38, column 15> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field dryTemp in :bytearray,year:chararray,month:chararray,day:chararray,:bytearray,:bytearray
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:196)
    at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1678)
    ... 15 more
Caused by: 
<file Documentos/pig/weather.pig, line 38, column 15> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field dryTemp in :bytearray,year:chararray,month:chararray,day:chararray,:bytearray,:bytearray
    at org.apache.pig.parser.LogicalPlanBuilder.buildForeachOp(LogicalPlanBuilder.java:1017)
    at org.apache.pig.parser.LogicalPlanGenerator.foreach_clause(LogicalPlanGenerator.java:15870)
    at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1933)
    at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
    at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
    at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
    at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
    ... 16 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field dryTemp in :bytearray,year:chararray,month:chararray,day:chararray,:bytearray,:bytearray
    at org.apache.pig.newplan.logical.expression.DereferenceExpression.translateAliasToPos(DereferenceExpression.java:215)
    at org.apache.pig.newplan.logical.expression.DereferenceExpression.getFieldSchema(DereferenceExpression.java:149)
    at org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:264)
    at org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:148)
    at org.apache.pig.newplan.logical.expression.DereferenceExpression.accept(DereferenceExpression.java:84)
    at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visitAll(SchemaResetter.java:67)
    at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:122)
    at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:245)
    at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
    at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:114)
    at org.apache.pig.parser.LogicalPlanBuilder.buildForeachOp(LogicalPlanBuilder.java:1015)
    ... 22 more

Here are a few lines of the file 201211 hourly.txt:

WBAN,Date,Time,StationType,SkyCondition,SkyConditionFlag,Visibility,VisibilityFlag,WeatherType,WeatherTypeFlag,DryBulbFarenheit,DryBulbFarenheitFlag,DryBulbCelsius,DryBulbCelsiusFlag,WetBulbFarenheit,WetBulbFarenheitFlag,WetBulbCelsius,WetBulbCelsiusFlag,DewPointFarenheit,DewPointFarenheitFlag,DewPointCelsius,DewPointCelsiusFlag,RelativeHumidity,RelativeHumidityFlag,WindSpeed,WindSpeedFlag,WindDirection,WindDirectionFlag,ValueForWindCharacter,ValueForWindCharacterFlag,StationPressure,StationPressureFlag,PressureTendency,PressureTendencyFlag,PressureChange,PressureChangeFlag,SeaLevelPressure,SeaLevelPressureFlag,RecordType,RecordTypeFlag,HourlyPrecip,HourlyPrecipFlag,Altimeter,AltimeterFlag 03011,20120101,0015,0,CLR, ,10.00, , , ,23, ,-5.0, ,15, ,-9.5, ,-9, ,-23.0, , 24, , 5, ,120, , , ,21.70, , , , , ,M, ,AA, , , ,30.43, 03011,20120101,0035,0,CLR, ,10.00, , , ,21, ,-6.0, ,14, ,-10.2, ,-9, ,-23.0, , 26, , 6, ,130, , , ,21.70, , , , , ,M, ,AA, , , ,30.43, 03011,20120101,0055,0,CLR, ,10.00, , , ,21, ,-6.0, ,13, ,-10.5, , -13, ,-25.0, , 21, , 0, ,000, , , ,21.71, , , , , ,M, ,AA, , , ,30.44, 03011,20120101,0115,0,CLR, ,10.00, , , ,21, ,-6.0, ,14, ,-10.1, ,-8, ,-22.0, , 27, , 0, ,000, , , ,21.71, , , , , ,M, ,AA, , , ,30.44, 03011,20120101,0135,0,CLR, ,10.00, , , ,21, ,-6.0, ,13, ,-10.4, , -11, ,-24.0, , 23, , 0, ,000, , , ,21.72, , , , , ,M, ,AA, , , ,30.45, 03011,20120101,0155,0,CLR, ,10.00, , , ,21, ,-6.0, ,13, ,-10.5, , -13, ,-25.0, , 21, , 6, ,130, , , ,21.72, , , , , ,M, ,AA, , , ,30.45, 03011,20120101,0215,0,CLR, ,10.00, , , ,21, ,-6.0, ,14, ,-10.2, ,-9, ,-23.0, , 26, , 5, ,090, , , ,21.73, , , , , ,M, ,AA, , , ,30.46, 03011,20120101,0235,0,CLR, ,10.00, , , ,21, ,-6.0, ,14, ,-10.2, ,-9, ,-23.0, , 26, , 6, ,120, , , ,21.74, , , , , ,M, ,AA, , , ,30.47, 03011,20120101,0255,0,CLR, ,10.00, , , ,21, ,-6.0, ,13, ,-10.4, , -11, ,-24.0, , 23, , 7, ,130, , , ,21.74, , , , , ,M, ,AA, , , ,30.48, 03011,20120101,0315,0,CLR, ,10.00, , , ,23, ,-5.0, ,15, ,-9.4, ,-8, ,-22.0, , 25, , 9, ,120, , , ,21.74, , , , , ,M, ,AA, , , ,30.47, 03011,20120101,0335,0,CLR, ,10.00, , , ,23, ,-5.0, ,15, ,-9.4, ,-8, ,-22.0, , 25, , 8, ,120, , , ,21.74, , , , , ,M, ,AA, , , ,30.47, 03011,20120101,0355,0,CLR, ,10.00, , , ,21, ,-6.0, ,14, ,-10.2, ,-9, ,-23.0, , 26, , 7, ,120, , , ,21.73, , , , , ,M, ,AA, , , ,30.46, 03011,20120101,0415,0,CLR, ,10.00, , , ,23, ,-5.0, ,14, ,-9.7, , -13, ,-25.0, , 19, , 7, ,130, , , ,21.73, , , , , ,M, ,AA, , , ,30.46,


Solution

  • I have done few modification in your script,
    1. Load the data with proper schema (you can change the datatype of each field according to your need)
    2. Optimized all the 6 loads into 1 load.
    3. Removed the commented code

    I have tested the below pig script with your input and its working fine, pasted the output also.

    PigScript:

    --Load all the files into relations
     months = LOAD 'hdfs:/data/big/data/weather/weather/20120[1-6]hourly.txt' USING PigStorage(',') AS (WBAN:int,Date:chararray,Time:chararray,StationType:int,SkyCondition:chararray,SkyConditionFlag,Visibility,VisibilityFlag,WeatherType,WeatherTypeFlag,DryBulbFarenheit:int,DryBulbFarenheitFlag,DryBulbCelsius:double,DryBulbCelsiusFlag,WetBulbFarenheit:int,WetBulbFarenheitFlag,WetBulbCelsius:double,WetBulbCelsiusFlag,DewPointFarenheit,DewPointFarenheitFlag,DewPointCelsius,DewPointCelsiusFlag,RelativeHumidity,RelativeHumidityFlag,WindSpeed,WindSpeedFlag,WindDirection,WindDirectionFlag,ValueForWindCharacter,ValueForWindCharacterFlag,StationPressure,StationPressureFlag,PressureTendency,PressureTendencyFlag,PressureChange,PressureChangeFlag,SeaLevelPressure,SeaLevelPressureFlag,RecordType,RecordTypeFlag,HourlyPrecip,HourlyPrecipFlag,Altimeter,AltimeterFlag);
    
    --filter out unwanted data
        clearWeather = FILTER months BY SkyCondition == 'CLR';
    
    --Transform and shape relation
        shapedWeather = FOREACH clearWeather GENERATE Date,
                               SUBSTRING(Date,0,4) AS year,
                               SUBSTRING(Date,4,6) AS month,
                               SUBSTRING(Date,6,8) AS day,
                               SkyCondition,
                               DryBulbFarenheit AS dryTemp;
    
    --Group relation specifying number of reducers
        groupedByMonthDay = GROUP shapedWeather BY (month, day) PARALLEL 10;
    
    --Aggregate relation
        aggedResults = FOREACH groupedByMonthDay GENERATE group as MonthDay, AVG(shapedWeather.dryTemp), MIN(shapedWeather.dryTemp), MAX(shapedWeather.dryTemp), COUNT(shapedWeather.dryTemp) PARALLEL 10;
    
    --Sort relation
        sortedResults = ORDER aggedResults BY $1 DESC;
    
    --Store results in HDFS
        STORE sortedResults INTO 'hdfs:/data/big/data/weather/pigresults' USING PigStorage(':');
    

    Output: (based on your above input samples)

       (01,01):21.615384615384617:21:23:13
    
     MonthDay:(01,01)
     Avg:21.615384615384617
     Min:21
     Max:23
     Count:13