[SOLVED] PDF table to pandas data frame using camelot

PDF table to pandas data frame using camelot

I'm trying to create a simple way to get data from pdf into a pandas data frame. Something like that:

import camelot
import pandas as pd

pdf = camelot.read_pdf("file1.pdf")

print(pdf[0].df)

The point is that I'm trying with two different files: File 1 and File 2 but for the second file I'm not able to get the info. It has more columns but I believe it shouldn't be a problem.

Also, the only way I could get a table from file 2 was using flavor="stream"

Result for File 1

Result for File 2

Solution

To correctly extract tables from the second file, it is necessary to process background lines, using the appropriate parameter (process_background) for lattice method, as you can see in the following code:

import camelot

tables=camelot.read_pdf('file2.pdf', process_background=True)

for table in tables:
    print(table.df)