I have some pdfs that I need to extract information from. I am using python, on centos 7 with python's lib slate. In the begining, slate works fine. But then i have to update several modules and libs. The slate lib doesn't work anymore. In order to solve the problem, i tried to update slate, and tried to use different versions, but none of them work. The error is:
File "/usr/lib64/python2.7/StringIO.py", line 271, in getvalue
self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 58: ordinal not in range(128)`
When i take the slate off my code, everything works just fine.
The piece of code that i am using slate:
def adequacaoCut(pdf, person, pathInt, pathImg):
with open('pdfs/'+pdf, 'rb') as f:
doc = slate.PDF(f)
print doc
... rest of code that works fine
Version of slate: 0.5.2
Version of python:2.7
As time pass, i dont remeber anymore which libs or updates on python, centos or whatever i did. What should I do?
I solve the problem myself. I discovery that i have two pdfminer in my computer (pdfminer and pdfminer.six). I think there were some kind of conflict between the libraries, or slate tried to call pdfminer.six instead of pdfminer. I uninstall both and re-install pdfminer only. It works as a charm now.