I have a directory full of individual PDFs that need to be merged together, based on their name. Each individual pdf file has one page. The naming convention for each file consists of a string name and a number. This is roughly what my directory looks like:
A_001.pdf A_002.pdf A_003.pdf B_001.pdf B_002.pdf B_003.pdf B_004.pdf
I basically need one PDF for A (pdf would have 3 pages) and one PDF for B (pdf would have 4 pages).The _001 and so forth should be the page number. My current Python script does output A.pdf and B.pdf, but includes pages from both A and B.
import PyPDF2, os
from PyPDF2 import PdfFileReader, PdfFileWriter, PdfFileMerger
from pathlib import Path
single_file_dir = r'Y:\Python\Single_PDFs'
binder_file_dir = r'Y:\Python\Combined_PDFs'
# get list of all files in the single PDF directory
single_file_list = []
for file in os.listdir(single_file_dir):
if file.endswith(".pdf"):
single_file_list.append(single_file_dir + "\\" + file)
print(single_file_list)
# get the file names for the output multi page pdfs
file_name_list = []
for file in single_file_list:
name = os.path.basename(file)
new_name = name[:-8]
file_name_list.append(new_name)
unique_file_name_list = list(set(file_name_list))
merger = PdfFileMerger()
print(unique_file_name_list)
#try to match input single file name to output file name
for file in single_file_list:
for name in unique_file_name_list:
if name in file:
merger.append(file)
merger.write(binder_file_dir + "\\" + name + ".pdf")
This script does result in A.pdf and B.pdf, but both output PDFs include many duplicates of both the A single PDFs and the B single PDFs. My goal is to have A_001.pdf, A_002.pdf, A_003.pdf merged into one multi-page pdf. Same with the B series PDFs.
I think your problem may be coming from reusing your pdf merger.
This code is adapted from another script I use to merge pdfs. Let me know if it works for you.
from collections import defaultdict
from pathlib import Path
from PyPDF2 import PdfMerger
single_file_dir = Path("Y:/") / "Python" / "Single_PDFs"
binder_file_dir = Path("Y:/") / "Python" / "Combined_PDFs"
file_groups: defaultdict[str, list[Path]] = defaultdict(list)
for file in single_file_dir.glob("*.pdf"):
group = file.name[0] # However you want to determine the group from the filename
file_groups[group].append(file)
for group, files in file_groups.items():
merger = PdfMerger()
for file in sorted(files):
merger.append(str(file))
with open(binder_file_dir / f"{group}.pdf", "wb") as binder:
merger.write(binder)
I like using the pathlib
module to avoid dealing with the platform specific idiosyncrasies of paths (especially \
's on windows)