I'm trying to create a simple way to get data from pdf into a pandas data frame. Something like that:
import camelot
import pandas as pd
pdf = camelot.read_pdf("file1.pdf")
print(pdf[0].df)
The point is that I'm trying with two different files: File 1 and File 2 but for the second file I'm not able to get the info. It has more columns but I believe it shouldn't be a problem.
Also, the only way I could get a table from file 2 was using flavor="stream"
Result for File 1
Result for File 2
To correctly extract tables from the second file, it is necessary to process background lines, using the appropriate parameter (process_background) for lattice method, as you can see in the following code:
import camelot
tables=camelot.read_pdf('file2.pdf', process_background=True)
for table in tables:
print(table.df)