pythoncsvtabula

Batch convert PDFs to CSVs


What am I doing wrong? Here is the code that I attempted:

import glob
import tabula

for filepath in glob.iglob('C:/Users/username/Downloads/folder with space/myfolderwithpdfs/*.pdf'):
    tabula.convert_into(filepath, pages="all", output_format='csv')

Error:

TypeError                                 Traceback (most recent call last)
Input In [11], in <cell line: 6>()
      5 # transform the pdfs into excel files
      6 for filepath in glob.iglob(C:/Users/username/Downloads/folder with space/myfolderwithpdfs/*.pdf'):
----> 7     tabula.convert_into(filepath, pages="all", output_format='csv')

TypeError: convert_into() missing 1 required positional argument: 'output_path'

Solution

  • This will read the pdf files in your Download folder then convert it into tabular using csv format.

    import os
    import glob
    import tabula
    
    path="/Users/username/Downloads/"
    for filepath in glob.glob(path+'*.pdf'):
        name=os.path.basename(filepath)
        tabula.convert_into(input_path=filepath, 
                            output_path=path+name+".csv",
                            pages="all")