hadoopapache-pighigh-level

Apache Pig - Illustrate command error


]$ cat webccess.txt
mark,yahoo.com,6
sam,google.com,7
john,yahoo.com,3
patrick,cnn.com,8
mary,facebook.com,1
mark,yahoo.com,4
john,bbc.com,10
andrew,twitter.com,3
patrick,twitter.com,9

I am running below task in Cloudera Quick Vm Hue-Pig Shell(Grunt)

grunt> stage1 = LOAD '/user/cloudera/webaccess.txt' USING PigStorage(',') AS (name:chararray, website:chararray, access:int);
grunt> DUMP stage1;
grunt> stage2 = FILTER stage1 by access >= 8;
grunt> stage3 = GROUP stage1 by name;
grunt> stage4 = FOREACH stage3 GENERATE group as GROUPS, MAX(stage1.access);
grunt> DUMP stage4;

OUTPUT:

(sam,7)
(john,10)
(mark,6)
(mary,1)
(andrew,3)
(patrick,9)

Till this every thing is fine.

When I apply ILLUSTRATE command to review on relation stage4, I am getting error as shown below,

grunt> ILLUSTRATE stage4;

2014-10-07 04:02:43,639 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-07 04:02:43,642 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost.localdomain:8020
2014-10-07 04:02:43,643 [main] WARN org.apache.hadoop.conf.Configuration - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-10-07 04:02:43,643 [main] WARN org.apache.hadoop.conf.Configuration - dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
2014-10-07 04:02:43,643 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost.localdomain:8021
2014-10-07 04:02:43,799 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-10-07 04:02:43,800 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-10-07 04:02:43,800 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-10-07 04:02:43,804 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-10-07 04:02:43,805 [main] ERROR org.apache.pig.pen.ExampleGenerator - Error reading data. Internal error creating job configuration.
java.lang.RuntimeException: Internal error creating job configuration.
at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:160)
at org.apache.pig.PigServer.getExamples(PigServer.java:1182)
at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:739)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:626)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:323)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:157)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
2014-10-07 04:02:43,868 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Exception
Details at logfile: /dev/null

I am in learning stage, due to this error, I was not able to move to next topic.

Also before starting this task the very first when I opened the Hue-Pig Shell(Grunt), I found the following warning.

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: org.apache.hadoop.util.PlatformName. Program will exit.
which: no hadoop in ((null))
which: no /usr/lib/hadoop/bin/hadoop in ((null))
dirname: missing operand
Try `dirname --help' for more information.
2014-10-07 03:18:27,802 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.7.0 (rexported) compiled May 28 2014, 11:05:48
2014-10-07 03:18:27,803 [main] INFO org.apache.pig.Main - Logging error messages to: /dev/null
2014-10-07 03:18:28,758 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/cloudera/.pigbootup not found
2014-10-07 03:18:30,436 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-10-07 03:18:30,444 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost.localdomain:8020
2014-10-07 03:18:37,832 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost.localdomain:8021
2014-10-07 03:18:37,842 [main] WARN org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS

Solution

  • I didnt face any issue, illustrate command is working fine. Can you try to exectue in local mode first?

        $pig -x local
        grunt> stage1 = LOAD 'input.txt' USING PigStorage(',') AS (name:chararray, website:chararray, access:int);
        grunt> stage2 = FILTER stage1 by access >= 8;
        grunt> stage3 = GROUP stage1 by name;
        grunt> stage4 = FOREACH stage3 GENERATE group as GROUPS, MAX(stage1.access);
        grunt> DUMP stage4;
        (sam,7)
        (john,10)
        (mark,6)
        (mary,1)
        (andrew,3)
        (patrick,9)
        grunt> ILLUSTRATE stage4;
        ----------------------------------------------------------------------------
        | stage1     | name:chararray     | website:chararray     | access:int     | 
        ----------------------------------------------------------------------------
        |            | john               | yahoo.com             | 3              | 
        |            | john               | bbc.com               | 10             | 
        ----------------------------------------------------------------------------
        --------------------------------------------------------------------------------------------------------------------------
        | stage3     | group:chararray     | stage1:bag{:tuple(name:chararray,website:chararray,access:int)}                     | 
        --------------------------------------------------------------------------------------------------------------------------
        |            | john                | {(john, yahoo.com, 3), (john, bbc.com, 10)}                                         | 
        |            | john                | {(john, yahoo.com, 3), (john, bbc.com, 10)}                                         | 
        --------------------------------------------------------------------------------------------------------------------------
        ------------------------------------------------
        | stage4     | GROUPS:chararray     | :int     | 
        ------------------------------------------------
        |            | john                 | 10       | 
        ------------------------------------------------