If I use NLineInputFormat in hadoop streaming, how to specify N?
hadoop jar /home/Software/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.6.0.jar \
-D stream.non.zero.exit.is.failure=false \
-D mapred.map.tasks=2 \
-D mapred.reduce.tasks=1 \
-files /home/hello.py \
-input /hello.txt \
-output /result \
-mapper "/home/.conda/envs/perimeter-pytorch2/bin/python hello.py" \
-inputformat org.apache.hadoop.mapred.lib.NLineInputFormat
-????
what command can specify N?
The non deprecated class is org.apache.hadoop.mapreduce.lib.input.NLineInputFormat
(All classes from mapred
package are deprecated)
Per Javadoc for that class, you'd pass configuration option for -D mapreduce.input.lineinputformat.linespermap=N
If you'd like to use PyTorch with HDFS data, I'd suggest using Spark or Flink over mapreduce