pythondataframepython-camelottabula-py

Unable to extract tables from tabula or Camelot


Tried to extract the below table using Tabula, but it was returning null dataframe. It was working fine for other kinds of similar tables.

enter image description here

Tried using Camelot as well but it didn't work as well. Any suggestions about how can I extract these?

Attached my code

from tabula import read_pdf 
from tabulate import tabulate
from tabula import read_pdf
import pandas as pd
# from tabula.io import read_pdf

Page_No = 1
tables = read_pdf('/content/page1.pdf',pages=Page_No,multiple_tables=True)
df1 = pd.DataFrame(tables[0])
df1
import camelot

tables2=camelot.read_pdf('page1.pdf', flavor='lattice', pages='1')
tables2

Solution

  • The issue got fixed after adding flavor='stream' and 'guess=False' in tabula.

    from tabula import read_pdf 
    from tabulate import tabulate
    from tabula import read_pdf
    import pandas as pd
    # from tabula.io import read_pdf
    
    Page_No = 1
    tables = read_pdf('/content/page1.pdf',pages=Page_No,guess=False,stream=True)
    df1 = pd.DataFrame(tables[0])
    df1