pythonpdftabula

How to use tabula to extract the table more details by using python script?


This is my code:

import tabula

# Specify the path to your PDF file
pdf_path = "path.pdf"

# Use tabula.read_pdf with the default auto method
tables = tabula.read_pdf(pdf_path, pages='all', multiple_tables=True)

# Print each table
for i, table in enumerate(tables):
    print(f"Table {i + 1}:\n{table}\n")

And that's the result come out: table extracted using tabula (python script)

But in the pdf, the table will look like: table i want to extract in pdf file

Therefore, I would like to know how to extract the table perfectly like this sample table?


Solution

  • I have found that by adding the lattice to true will make the table looks better like this: table printed out in terminal after using the lattice parameter

    tables = tabula.read_pdf(pdf_path, pages='all', 
    multiple_tables=True,lattice=True)
    

    But there are still redundant column for example the Unnamed: 0 at the beginning and the Unnamed: 1 columns at the end. So, how can i make it better?