I want to split a PDF file which has many pages using python and the script is as shown below:
#!/usr/bin/python3
from PyPDF2 import PdfFileWriter, PdfReader
inputpdf = PdfReader(open("/pdf2xls/split_pdf/xyz.pdf", "rb"))
for i in range(inputpdf.numPages):
output = PdfFileWriter()
output.addPage(inputpdf.getPage(i))
with open("/pdf2xls/split_pdf/xyz-page%s.pdf" % (i+1), "wb") as outputStream:
output.write(outputStream)
However its throwing following error
Traceback (most recent call last):
File "/root/scripts/pdf2xls/Test/T2/pdf_split.py", line 7, in <module>
for i in range(inputpdf.numPages):
File "/usr/local/lib/python3.9/dist-packages/PyPDF2/_reader.py", line 467, in numPages
deprecation_with_replacement("reader.numPages", "len(reader.pages)", "3.0.0")
File "/usr/local/lib/python3.9/dist-packages/PyPDF2/_utils.py", line 369, in deprecation_with_replacement
deprecation(DEPR_MSG_HAPPENED.format(old_name, removed_in, new_name))
File "/usr/local/lib/python3.9/dist-packages/PyPDF2/_utils.py", line 351, in deprecation
raise DeprecationError(msg)
PyPDF2.errors.DeprecationError: reader.numPages is deprecated and was removed in PyPDF2 3.0.0. Use len(reader.pages) instead.
How can I fix this ? Please guide me.
You just need to follow the error messages.
PyPDF2 got deprecated and moved back to pypdf. When we did that switch, we ensured that the pypdf classes / methods / functions follow the typical Python naming scheme. So we renamed them from snakeCase
to camel_case
. We also switched from getters (like .getPage(index)
) to properties (like .pages[index]
).
You can see that the pages property behaves like a list. If you want to get the number of pages, you just call len( reader.pages)
.
More information is in the official migration guide: https://pypdf2.readthedocs.io/en/3.0.0/user/migration-1-to-2.html
Did you read the error message? It tell you exactly what you should change.
Yes I did that and now I am getting following error: ``` PyPDF2.errors.DeprecationError: reader.getPage(pageNumber) is deprecated and was removed in PyPDF2 3.0.0. Use reader.pages[page_number] instead.
You need to work on them one by one. Don't worry, there are not to many and the messages tell you exactly what to do.