pythonmapreducemrjob

Top Ten Values from a text file using mrjob


I am trying to get the top 10 items in a file. The file is a text file. The file looks something like this:

10000
10001
10002
10003
10004
10005
10090
10011
10060
10050
10040
11000
20000

Here is what I tried but it keeps on giving me an error:

from mrjob.job import MRJob

class MRWordCount(MRJob):
    
  def mapper(self,_,lines):
    for number in lines.split(','):
      yield None, number
  def reducer(self,key, numbers): 
    self.alist = []
    for number in numbers:
      self.alist.append(number)
    self.topten = []
    for i in range(10):
      self.topten.append(max(self.topten))
      self.alist.remove(max(self.alist))
    for i in range(10):
      yield self.topten[i]


if __name__ == '__main__':
    MRWordCount.run()

The error that I am getting is:

ValueError: too many values to unpack (expected 2)

What I want to do is just sort these values in this file and then output the top ten numbers in the file that sorted by highest to lowest. Anybody have any idea how I would do this using mrjob or know how to resolve the error that I'm getting? Just to be clear, I'm not trying to get the values that are the most frequently appearing in the file itself, but just the top ten values in the file itself.


Solution

  • from mrjob.job import MRJob
    
    
    class Top10Integers(MRJob):
        def mapper(self, key, line):
            for integer in line.split():  
                yield None, int(integer) 
    
        def reducer(self, key, values):
            integers = list(values)
            integers.sort(reverse=True)
    
            integers = integers[:10]
            for integer in integers:
                yield None, integer
    
    
    if __name__ == '__main__':
        Top10Integers.run()