pythonpymupdf

How to remove scale from PDF file using Python


Here I have files in PDF which contain line diagrams. I want to extract each line diagram from the PDF and convert it to a single svg file. My issue is each file has margin scale and I want to remove that.

enter image description here

Currently I am using:

import fitz  # PyMuPDF

doc = fitz.open(pdf_path)
page = doc[0]  # Get the first page (index 0)
svg_content = page.get_svg_image(matrix=fitz.Matrix(1, 1))

for converting file to svg, which is working well, but I don't need full PDF canvas file. I want to crop the image and save that image anywhere I want.


Solution

  • Here is the code which includes everything:

    import fitz  
    
    def extract_cropped_svg(pdf_path, output_svg_path, page_number=0):
        doc = fitz.open(pdf_path)
        page = doc[page_number]
    
        rect = page.get_drawings()  
    
        bounds = [d["rect"] for d in rect if d["rect"].is_valid]
        if not bounds:
            print("No vector drawings found")
            return
    
        # Combine bounding boxes
        crop_rect = bounds[0]
        for r in bounds[1:]:
            crop_rect |= r  
    
        crop_rect = crop_rect + (-2, -2, 2, 2)  
       
        page.set_cropbox(crop_rect)
    
        svg = page.get_svg_image(matrix=fitz.Matrix(1, 1))
        with open(output_svg_path, "w", encoding="utf-8") as f:
            f.write(svg)
    
        print(f"SVG saved to {output_svg_path}")