pythoncentospython-slate

unicodeDecodeError when using slate


I have some pdfs that I need to extract information from. I am using python, on centos 7 with python's lib slate. In the begining, slate works fine. But then i have to update several modules and libs. The slate lib doesn't work anymore. In order to solve the problem, i tried to update slate, and tried to use different versions, but none of them work. The error is:

File "/usr/lib64/python2.7/StringIO.py", line 271, in getvalue
self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 58: ordinal not in range(128)`

When i take the slate off my code, everything works just fine.

The piece of code that i am using slate:

def adequacaoCut(pdf, person, pathInt, pathImg):
    with open('pdfs/'+pdf, 'rb') as f:
        doc = slate.PDF(f)
        print doc
        ... rest of code that works fine

As time pass, i dont remeber anymore which libs or updates on python, centos or whatever i did. What should I do?


Solution

  • I solve the problem myself. I discovery that i have two pdfminer in my computer (pdfminer and pdfminer.six). I think there were some kind of conflict between the libraries, or slate tried to call pdfminer.six instead of pdfminer. I uninstall both and re-install pdfminer only. It works as a charm now.