pythonwindowshadoopmrjob

Hadoop Found 2 unexpected arguments


I'm running Hadoop on windows and I'm trying to submit an MRJob but it comes back with the error Found 2 unexpected arguments on the command line.

(cmtle) d:\>python norad_counts.py -r hadoop --hadoop-streaming-jar C:\hadoop-3.3.0\share\hadoop\tools\lib\hadoop-streaming-3.3.0.jar all_files.txt
No configs found; falling back on auto-configuration
No configs specified for hadoop runner
Looking for hadoop binary in C:\hadoop-3.3.0\bin\bin...
Looking for hadoop binary in $PATH...
Found hadoop binary: C:\hadoop-3.3.0\bin\hadoop.CMD
Using Hadoop version 3.3.0
Creating temp directory C:\Users\mille\AppData\Local\Temp\norad_counts.mille.20210318.083636.028559
uploading working dir files to hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd...
Copying other local files to hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/
Running step 1 of 1...
  Found 2 unexpected arguments on the command line [hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/norad_counts.py#norad_counts.py, hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/setup-wrapper.sh#setup-wrapper.sh]
  Try -help for more information
  Streaming Command Failed!
Attempting to fetch counters from logs...
Can't fetch history log; missing job ID
No counters found
Scanning logs for probable cause of failure...
Can't fetch history log; missing job ID
Can't fetch task logs; missing application ID
Step 1 of 1 failed: Command '['C:\\hadoop-3.3.0\\bin\\hadoop.CMD', 'jar', 'C:\\hadoop-3.3.0\\share\\hadoop\\tools\\lib\\hadoop-streaming-3.3.0.jar', '-files', 'hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/mrjob.zip#mrjob.zip,hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/norad_counts.py#norad_counts.py,hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/wd/setup-wrapper.sh#setup-wrapper.sh', '-input', 'hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/files/all_files.txt', '-output', 'hdfs:///user/mille/tmp/mrjob/norad_counts.mille.20210318.083636.028559/output', '-mapper', '/bin/sh -ex setup-wrapper.sh python3 norad_counts.py --step-num=0 --mapper', '-combiner', '/bin/sh -ex setup-wrapper.sh python3 norad_counts.py --step-num=0 --combiner', '-reducer', '/bin/sh -ex setup-wrapper.sh python3 norad_counts.py --step-num=0 --reducer']' returned non-zero exit status 1.

Here's the content of norad_count.py:

from mrjob.job import MRJob, JSONProtocol
import pandas as pd

class MRNoradCounts(MRJob):
    
    def mapper(self, _, file_path):
        try:
            df = pd.read_csv(file_path, compression='gzip', low_memory=False)
            df = df[(df.MEAN_MOTION > 11.25) & (df.ECCENTRICITY < 0.25)]
        except:
            raise Exception(f'Failed to open {file_path}') 
        #print(f'File: {file_path}')
        for norad in df.NORAD_CAT_ID.to_list():
            yield norad, 1
            
    def combiner(self, norad, counts):
        yield norad, sum(counts)
        
    def reducer(self, norad, counts):
        yield norad, sum(counts)
        
if __name__ == "__main__":
    MRNoradCounts.run()

Solution

  • I fixed my issue by reinstalling Java JDK. I had originally installed it to C:\Program Files\Java but moved it to C:\Java based on some other instructions. I thought updating the environment variables would be enough but apparently, it wasn't. So I uninstalled Java and reinstalled it. This time to C:\Java which fixed my issue.