[SOLVED] Batch convert PDFs to CSVs

Batch convert PDFs to CSVs

What am I doing wrong? Here is the code that I attempted:

import glob
import tabula

for filepath in glob.iglob('C:/Users/username/Downloads/folder with space/myfolderwithpdfs/*.pdf'):
    tabula.convert_into(filepath, pages="all", output_format='csv')

Error:

TypeError                                 Traceback (most recent call last)
Input In [11], in <cell line: 6>()
      5 # transform the pdfs into excel files
      6 for filepath in glob.iglob(C:/Users/username/Downloads/folder with space/myfolderwithpdfs/*.pdf'):
----> 7     tabula.convert_into(filepath, pages="all", output_format='csv')

TypeError: convert_into() missing 1 required positional argument: 'output_path'

Solution

This will read the pdf files in your Download folder then convert it into tabular using csv format.

import os
import glob
import tabula

path="/Users/username/Downloads/"
for filepath in glob.glob(path+'*.pdf'):
    name=os.path.basename(filepath)
    tabula.convert_into(input_path=filepath, 
                        output_path=path+name+".csv",
                        pages="all")