I'm trying to recovers notes I took on an iPad over a PDF, that I saved as a new PDF before the application crashed. This new PDF is corrupted, but I could repair it so that it contains all my notes (highlights and margin scripted notes) but not the original PDF.
I am trying to use the fitz
library (a.k.a PyMuPDF) to recover the full notes by overlaying the original PDF with my notes (using an alpha mask of 50% so that I can see through my highlights).
Unfortunately I could not manage to overlay two pages with transparency! The notes page is always masking the original PDF, so that I only see highlights and scripted notes on a BLANK page.
Example of 1 page:
I have tried the following code and a few variants without success (note in the following code I'm only trying to create one page -- page 276 -- of the whole document to speed up the test):
import fitz # PyMuPDF
journal_document = fitz.open(journal_path) # type: ignore
notes_document = fitz.open(notes_path) # type: ignore
combined_document = fitz.open() # type: ignore
for page_num in range(len(journal_document)):
if page_num<276:
continue
# load pages to overlay
journal_page = journal_document.load_page(page_num)
notes_page = notes_document.load_page(page_num)
# extract bottom image
journal_pix = journal_page.get_pixmap()
journal_image = fitz.Pixmap(journal_pix, 0)
# create a new page in output doc
combined_page = combined_document.new_page(width=journal_page.rect.width,
height=journal_page.rect.height)
combined_page.show_pdf_page(journal_page.rect, journal_document, page_num)
# extract notes to be overlayed
notes_pix = notes_page.get_displaylist().get_pixmap()
notes_image = fitz.Pixmap(notes_pix)
notes_image.set_alpha(bytearray(int(128)) * 595 * 842)
# Insérer l'image du journal sur la nouvelle page
combined_page.insert_image(notes_page.rect, stream=notes_image.tobytes(),
alpha = int(128))
print(f"page {page_num} saved...")
break
combined_document.save(output_path)
Thanks @furas - Pillow did the trick. Yet I'm fascinated by the complexity of PDFs arcanes and would dream to find also a solution using PyMuPDF: the code above was so close to finding how to stack two images with some alpha mask & transparency...
For the record, here is a pillow code snippet that worked, where image1 and image2 are pixmap extracted like above and #page_num is page number iterator:
from PIL import Image
image1 = Image.open(buf_img1)
image2 = Image.open(buf_img2)
mask = Image.new('L', image1.size, 128) # 128 corresponds to 50% transparency
result = Image.composite(image1, image2, mask)
result.save(f'images/p{page_num}.jpg')