pythontranslationreadlinewordnetreadlines

Python: How to properly use readline() and readlines()


I've build a Python script to randomly create sentences using data from the Princeton English Wordnet, following diagrams provided by Gödel, Escher, Bach. Calling python GEB.py produces a list of nonsensical sentences in English, such as:

resurgent inaesthetic cost. the bryophytic fingernail. aversive fortieth peach. the asterismal hide. the flour who translate gown which take_a_dare a punch through applewood whom the renewed request enfeoff. an lobeliaceous freighter beside tuna.

And saves them to gibberish.txt. This script works fine.

Another script (translator.py) takes gibberish.txt and, through py-googletrans Python module, tries to translate those random sentences to Portuguese:

from googletrans import Translator
import json

tradutor = Translator()

with open('data.json') as dataFile:
    data = json.load(dataFile)


def buscaLocal(keyword):
    if keyword in data:
        print(keyword + data[keyword])
    else:
        buscaAPI(keyword)


def buscaAPI(keyword):
    result = tradutor.translate(keyword, dest="pt")
    data.update({keyword: result.text})

    with open('data.json', 'w') as fp:
        json.dump(data, fp)

    print(keyword + result.text)


keyword = open('/home/user/gibberish.txt', 'r').readline()
buscaLocal(keyword)

Currently the second script outputs only the translation of the first sentence in gibberish.txt. Something like:

resurgent inaesthetic cost. aumento de custos inestético.

I have tried to use readlines() instead of readline(), but I get the following error:

Traceback (most recent call last):
  File "main.py", line 28, in <module>
    buscaLocal(keyword)
  File "main.py", line 11, in buscaLocal
    if keyword in data:
TypeError: unhashable type: 'list'

I've read similar questions about this error here, but it is not clear to me what should I use in order to read the whole list of sentences contained in gibberish.txt (new sentences begin in a new line).

How can I read the whole list of sentences contained in gibberish.txt? How should I adapt the code in translator.py in order to achieve that? I am sorry if the question is a bit confuse, I can edit if necessary, I am a Python newbie and I would appreciate if someone could help me out.


Solution

  • Let's start with what you're doing to the file object. You open a file, get a single line from it, and then don't close it. A better way to do it would be to process the entire file and then close it. This is generally done with a with block, which will close the file even if an error occurs:

    with open('gibberish.txt') as f:
        # do stuff to f
    

    Aside from the material benefits, this will make the interface clearer, since f is no longer a throwaway object. You have three easy options for processing the entire file:

    1. Use readline in a loop since it will only read one line at a time. You will have to strip off the newline characters manually and terminate the loop when '' appears:

      while True:
          line = f.readline()
          if not line: break
          keyword = line.rstrip()
          buscaLocal(keyword)
      

      This loop can take many forms, one of which is shown here.

    2. Use readlines to read in all the lines in the file at once into a list of strings:

      for line in f.readlines():
          keyword = line.rstrip()
          buscaLocal(keyword)
      

      This is much cleaner than the previous option, since you don't need to check for loop termination manually, but it has the disadvantage of loading the entire file all at once, which the readline loop does not.

      This brings us to the third option.

    3. Python files are iterable objects. You can have the cleanliness of the readlines approach with the memory savings of readline:

      for line in f:
           buscaLocal(line.rstrip())
      

      this approach can be simulated using readline with the more arcane form of next to create a similar iterator:

      for line in next(f.readline, ''):
           buscaLocal(line.rstrip())
      

    As a side point, I would make some modifications to your functions:

    def buscaLocal(keyword):
        if keyword not in data:
            buscaAPI(keyword)
        print(keyword + data[keyword])
    
    def buscaAPI(keyword):
        # Make your function do one thing. In this case, do a lookup.
        # Printing is not the task for this function.
        result = tradutor.translate(keyword, dest="pt")
        # No need to do a complicated update with a whole new
        # dict object when you can do a simple assignment.
        data[keyword] = result.text
    
    ...
    
    # Avoid rewriting the file every time you get a new word.
    # Do it once at the very end.
    with open('data.json', 'w') as fp:
        json.dump(data, fp)