I have a pdf that is already in the blob storage. I need to highlight few lines in it and store it as a new pdf (again in blob storage). I tried finding it in the links below but couldn't. Below is the pseudo code:
import fitz
def edit_pdfs(path_to_pdf_from_blob)
### READ pdf from blob storage
doc = fitz.open(path_to_pdf_from_blob)
## EDIT doc (fitz.fitz.Document) - I already have working code to edit the doc , but won't put it here to avoid complexity
### WRITE pdf to blob storage
doc.save(new_path_to_pdf_from_blob)
Answers already seen:
Access data within the blob storage without downloading
How can I read a text file from Azure blob storage directly without downloading it to a local file(using python)?
Azure Blobstore: How can I read a file without having to download the whole thing first?
I tried in my environment and got the below results:
Initially, I had one pdf document in my container with the name important.pdf
with content like below.
You can use the below code to edit the pdf without downloading it locally.
Code:
from io import BytesIO
import fitz
from azure.storage.blob import BlobServiceClient
connection_string = "your-connection-string"
blob_name = "important.pdf"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
blob_client = blob_service_client.get_blob_client(container="test", blob=blob_name)
# Download the PDF file as bytes
pdf_bytes = blob_client.download_blob().content_as_bytes()
doc = fitz.open(stream=pdf_bytes, filetype="pdf")
page = doc[0]
rect = fitz.Rect(50, 50, 200, 200)
highlight = page.add_highlight_annot(rect)
# Set the color of the highlight annotation
highlight.update()
new_blob_name = "demo.pdf"
modified_pdf_stream = BytesIO()
doc.save(modified_pdf_stream)
modified_pdf_bytes = modified_pdf_stream.getvalue()
# Get a BlobClient object for the new PDF file
new_blob_client = blob_service_client.get_blob_client(container="test", blob=new_blob_name)
new_blob_client.upload_blob(modified_pdf_bytes, overwrite=True)
#delete an original file
blob_client = blob_service_client.get_blob_client(container="test", blob=blob_name)
blob_client.delete_blob()
Output: