I'm new to python and I'm getting this error when trying to execute the following code which aims to take the contents of this pdf and put it in an excel document. My os is Windows 10 and I'm using VS code via Anaconda3. I'm not sure what I'm doing wrong. Thank you all in advance.
FileNotFoundError: [WinError 2] The system cannot find the file specified
import tabula
file_path = (r"C:\Users\shattv\anaconda3\envs\venv1\TestInvoice.pdf")
oup = (r"C:\Users\shattv\anaconda3\envs\venv1\test.xlsx")
df = tabula.read_pdf(file_path,pages="all")
df.to_excel (oup)
I tried checking os.getcwd and got the same file path:C:\Users\shattv\anaconda3\envs\venv1>. Below are screenshots of the excel and pdf files. I also tried changing to a backslash and still got this error.
C:/Users/shattv/anaconda3/envs/venv1/TestInvoice.pdf"
Try this:
remove r
tag in front of the file.
file_path = ("C:/Users/user/anaconda3/envs/venv1/TestInvoice.pdf")
These should work. If the above two do not work try this.
import os.path
file_path = ("C:/Users/user/anaconda3/envs/venv1/TestInvoice.pdf")
isFile = os.path.isfile(file_path)
print(is_file)
If this prints False
, then Python can not locate file, and then follow this tutorial. If it prints True
try installing Java and putting it in PATH. Tabula is a simple Python wrapper of tabula-java, which can read tables in a PDF and then change there format. Since it is a wrapper of Java you should install have these two things:
Once you have both it should work. If not I do not know how to fix that.