I am having a problem in executing a script in Apache Pig. I have 3 files namely movies.csv, ratings.csv, tags.csv. First I want to load "movies.csv", then load "ratings.csv" and join both tables. But I am encountering an error in loading of the files. Code given by me is as follows,
register 'piggybank-0.15.0.jar'
DEFINE CSVLoader org.apache.pig.piggybank.storage.CSVLoader();
part1 = LOAD '/home/cloudera/ml-20m/movies' as (movieId: chararray, title: chararray, genre: chararray);
cat part1;
When I give "cat" command, I am getting an error,as
ERROR 2997: Encountered IOException. Directory part1 does not exist.
java.io.IOException: Directory part1 does not exist.
at org.apache.pig.tools.grunt.GruntParser.processCat(GruntParser.java:677)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:233)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
at org.apache.pig.Main.run(Main.java:547)
at org.apache.pig.Main.main(Main.java:158)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
But I have the file at specified location. I don't know why pig is not able to recognize the input file. I have tried by placing the input file in hdfs and loading the file. But the error is same though. Can anyone please help me. Thanks in advance.
part1 is not a file but a relation.When you use the LOAD command in Pig,you are instructing to load the contents of the file into a relation.You cannot use cat on a relation since the most common use of cat is to read the contents of files. To display the content of part1 use
DUMP part1;
else if you insist on using cat, then specify the full path to the file
cat /home/cloudera/ml-20m/movies;