my pig was run code temperature and me an error, put the code below and the error to facilitate the understanding of my problem occurred.
the error is in line 38 column 15, tried to delete the dryTemp, but also gave another error.
Code:
--Load files into relations
month1 = LOAD 'hdfs:/data/big/data/weather/weather/201201hourly.txt' USING PigStorage(',');
month2 = LOAD 'hdfs:/data/big/data/weather/weather/201202hourly.txt' USING PigStorage(',');
month3 = LOAD 'hdfs:/data/big/data/weather/weather/201203hourly.txt' USING PigStorage(',');
month4 = LOAD 'hdfs:/data/big/data/weather/weather/201204hourly.txt' USING PigStorage(',');
month5 = LOAD 'hdfs:/data/big/data/weather/weather/201205hourly.txt' USING PigStorage(',');
month6 = LOAD 'hdfs:/data/big/data/weather/weather/201206hourly.txt' USING PigStorage(',');
--Combine relations
months = UNION month1, month2, month3, month4, month5, month6;
/* Splitting relations
SPLIT months INTO
splitMonth1 IF SUBSTRING(date, 4, 6) == '01',
splitMonth2 IF SUBSTRING(date, 4, 6) == '02',
splitMonth3 IF SUBSTRING(date, 4, 6) == '03',
splitRest IF (SUBSTRING(date, 4, 6) == '04' OR SUBSTRING(date, 4, 6) == '04');
*/
/* Joining relations
stations = LOAD 'hdfs:/data/big/data/QCLCD201211/stations.txt' USING PigStorage() AS (id:int, name:chararray)
JOIN months BY wban, stations by id;
*/
--filter out unwanted data
clearWeather = FILTER months BY skyCondition == 'CLR';
--Transform and shape relation
shapedWeather = FOREACH clearWeather GENERATE date, SUBSTRING(date, 0, 4) as year, SUBSTRING(date, 4, 6) as month, SUBSTRING(date, 6, 8) as day, skyCondition, dryTemp;
--Group relation specifying number of reducers
groupedByMonthDay = GROUP shapedWeather BY (month, day) PARALLEL 10;
--Aggregate relation
aggedResults = FOREACH groupedByMonthDay GENERATE group as MonthDay, AVG(shapedWeather.dryTemp), MIN(shapedWeather.dryTemp), MAX(shapedWeather.dryTemp), COUNT(shapedWeather.dryTemp) PARALLEL 10;
--Sort relation
sortedResults = ORDER aggedResults BY $1 DESC;
--Store results in HDFS
STORE sortedResults INTO 'hdfs:/data/big/data/weather/pigresults' USING PigStorage(':');
Put down the error, he was kinda big, still do not know much about the pig, I'm still studying, I believe that error has to do with the type of variable that is not recognized but do not know fix it hopefully help me.
Error:
ERROR 1128: Cannot find field dryTemp in :bytearray,year:chararray,month:chararray,day:chararray,:bytearray,:bytearray
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Cannot find field dryTemp in :bytearray,year:chararray,month:chararray,day:chararray,:bytearray,:bytearray
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1691)
at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1411)
at org.apache.pig.PigServer.parseAndBuild(PigServer.java:344)
at org.apache.pig.PigServer.executeBatch(PigServer.java:369)
at org.apache.pig.PigServer.executeBatch(PigServer.java:355)
at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
at org.apache.pig.Main.run(Main.java:607)
at org.apache.pig.Main.main(Main.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: Failed to parse: Pig script failed to parse:
<file Documentos/pig/weather.pig, line 38, column 15> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field dryTemp in :bytearray,year:chararray,month:chararray,day:chararray,:bytearray,:bytearray
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:196)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1678)
... 15 more
Caused by:
<file Documentos/pig/weather.pig, line 38, column 15> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field dryTemp in :bytearray,year:chararray,month:chararray,day:chararray,:bytearray,:bytearray
at org.apache.pig.parser.LogicalPlanBuilder.buildForeachOp(LogicalPlanBuilder.java:1017)
at org.apache.pig.parser.LogicalPlanGenerator.foreach_clause(LogicalPlanGenerator.java:15870)
at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1933)
at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:1102)
at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:560)
at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:421)
at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:188)
... 16 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field dryTemp in :bytearray,year:chararray,month:chararray,day:chararray,:bytearray,:bytearray
at org.apache.pig.newplan.logical.expression.DereferenceExpression.translateAliasToPos(DereferenceExpression.java:215)
at org.apache.pig.newplan.logical.expression.DereferenceExpression.getFieldSchema(DereferenceExpression.java:149)
at org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:264)
at org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:148)
at org.apache.pig.newplan.logical.expression.DereferenceExpression.accept(DereferenceExpression.java:84)
at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visitAll(SchemaResetter.java:67)
at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:122)
at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:245)
at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:114)
at org.apache.pig.parser.LogicalPlanBuilder.buildForeachOp(LogicalPlanBuilder.java:1015)
... 22 more
Here are a few lines of the file 201211 hourly.txt:
WBAN,Date,Time,StationType,SkyCondition,SkyConditionFlag,Visibility,VisibilityFlag,WeatherType,WeatherTypeFlag,DryBulbFarenheit,DryBulbFarenheitFlag,DryBulbCelsius,DryBulbCelsiusFlag,WetBulbFarenheit,WetBulbFarenheitFlag,WetBulbCelsius,WetBulbCelsiusFlag,DewPointFarenheit,DewPointFarenheitFlag,DewPointCelsius,DewPointCelsiusFlag,RelativeHumidity,RelativeHumidityFlag,WindSpeed,WindSpeedFlag,WindDirection,WindDirectionFlag,ValueForWindCharacter,ValueForWindCharacterFlag,StationPressure,StationPressureFlag,PressureTendency,PressureTendencyFlag,PressureChange,PressureChangeFlag,SeaLevelPressure,SeaLevelPressureFlag,RecordType,RecordTypeFlag,HourlyPrecip,HourlyPrecipFlag,Altimeter,AltimeterFlag 03011,20120101,0015,0,CLR, ,10.00, , , ,23, ,-5.0, ,15, ,-9.5, ,-9, ,-23.0, , 24, , 5, ,120, , , ,21.70, , , , , ,M, ,AA, , , ,30.43, 03011,20120101,0035,0,CLR, ,10.00, , , ,21, ,-6.0, ,14, ,-10.2, ,-9, ,-23.0, , 26, , 6, ,130, , , ,21.70, , , , , ,M, ,AA, , , ,30.43, 03011,20120101,0055,0,CLR, ,10.00, , , ,21, ,-6.0, ,13, ,-10.5, , -13, ,-25.0, , 21, , 0, ,000, , , ,21.71, , , , , ,M, ,AA, , , ,30.44, 03011,20120101,0115,0,CLR, ,10.00, , , ,21, ,-6.0, ,14, ,-10.1, ,-8, ,-22.0, , 27, , 0, ,000, , , ,21.71, , , , , ,M, ,AA, , , ,30.44, 03011,20120101,0135,0,CLR, ,10.00, , , ,21, ,-6.0, ,13, ,-10.4, , -11, ,-24.0, , 23, , 0, ,000, , , ,21.72, , , , , ,M, ,AA, , , ,30.45, 03011,20120101,0155,0,CLR, ,10.00, , , ,21, ,-6.0, ,13, ,-10.5, , -13, ,-25.0, , 21, , 6, ,130, , , ,21.72, , , , , ,M, ,AA, , , ,30.45, 03011,20120101,0215,0,CLR, ,10.00, , , ,21, ,-6.0, ,14, ,-10.2, ,-9, ,-23.0, , 26, , 5, ,090, , , ,21.73, , , , , ,M, ,AA, , , ,30.46, 03011,20120101,0235,0,CLR, ,10.00, , , ,21, ,-6.0, ,14, ,-10.2, ,-9, ,-23.0, , 26, , 6, ,120, , , ,21.74, , , , , ,M, ,AA, , , ,30.47, 03011,20120101,0255,0,CLR, ,10.00, , , ,21, ,-6.0, ,13, ,-10.4, , -11, ,-24.0, , 23, , 7, ,130, , , ,21.74, , , , , ,M, ,AA, , , ,30.48, 03011,20120101,0315,0,CLR, ,10.00, , , ,23, ,-5.0, ,15, ,-9.4, ,-8, ,-22.0, , 25, , 9, ,120, , , ,21.74, , , , , ,M, ,AA, , , ,30.47, 03011,20120101,0335,0,CLR, ,10.00, , , ,23, ,-5.0, ,15, ,-9.4, ,-8, ,-22.0, , 25, , 8, ,120, , , ,21.74, , , , , ,M, ,AA, , , ,30.47, 03011,20120101,0355,0,CLR, ,10.00, , , ,21, ,-6.0, ,14, ,-10.2, ,-9, ,-23.0, , 26, , 7, ,120, , , ,21.73, , , , , ,M, ,AA, , , ,30.46, 03011,20120101,0415,0,CLR, ,10.00, , , ,23, ,-5.0, ,14, ,-9.7, , -13, ,-25.0, , 19, , 7, ,130, , , ,21.73, , , , , ,M, ,AA, , , ,30.46,
I have done few modification in your script,
1. Load the data with proper schema (you can change the datatype of each field according to your need)
2. Optimized all the 6 loads into 1 load.
3. Removed the commented code
I have tested the below pig script with your input and its working fine, pasted the output also.
PigScript:
--Load all the files into relations
months = LOAD 'hdfs:/data/big/data/weather/weather/20120[1-6]hourly.txt' USING PigStorage(',') AS (WBAN:int,Date:chararray,Time:chararray,StationType:int,SkyCondition:chararray,SkyConditionFlag,Visibility,VisibilityFlag,WeatherType,WeatherTypeFlag,DryBulbFarenheit:int,DryBulbFarenheitFlag,DryBulbCelsius:double,DryBulbCelsiusFlag,WetBulbFarenheit:int,WetBulbFarenheitFlag,WetBulbCelsius:double,WetBulbCelsiusFlag,DewPointFarenheit,DewPointFarenheitFlag,DewPointCelsius,DewPointCelsiusFlag,RelativeHumidity,RelativeHumidityFlag,WindSpeed,WindSpeedFlag,WindDirection,WindDirectionFlag,ValueForWindCharacter,ValueForWindCharacterFlag,StationPressure,StationPressureFlag,PressureTendency,PressureTendencyFlag,PressureChange,PressureChangeFlag,SeaLevelPressure,SeaLevelPressureFlag,RecordType,RecordTypeFlag,HourlyPrecip,HourlyPrecipFlag,Altimeter,AltimeterFlag);
--filter out unwanted data
clearWeather = FILTER months BY SkyCondition == 'CLR';
--Transform and shape relation
shapedWeather = FOREACH clearWeather GENERATE Date,
SUBSTRING(Date,0,4) AS year,
SUBSTRING(Date,4,6) AS month,
SUBSTRING(Date,6,8) AS day,
SkyCondition,
DryBulbFarenheit AS dryTemp;
--Group relation specifying number of reducers
groupedByMonthDay = GROUP shapedWeather BY (month, day) PARALLEL 10;
--Aggregate relation
aggedResults = FOREACH groupedByMonthDay GENERATE group as MonthDay, AVG(shapedWeather.dryTemp), MIN(shapedWeather.dryTemp), MAX(shapedWeather.dryTemp), COUNT(shapedWeather.dryTemp) PARALLEL 10;
--Sort relation
sortedResults = ORDER aggedResults BY $1 DESC;
--Store results in HDFS
STORE sortedResults INTO 'hdfs:/data/big/data/weather/pigresults' USING PigStorage(':');
Output: (based on your above input samples)
(01,01):21.615384615384617:21:23:13
MonthDay:(01,01)
Avg:21.615384615384617
Min:21
Max:23
Count:13