pythonout-of-memorytiff

Efficiently Removing a Single Page from a Large Multi-page TIFF with JPEG Compression in Python


I am working with a large multi-page TIFF file that is JPEG-compressed, and I need to remove a single page from it. I am using the tifffile Python package to process the TIFF, and I already know which page I want to remove based on metadata tags associated with that page. My current approach is to read all pages, modify the target page (either by skipping or replacing it), and write the rest back to a new TIFF file.

Here’s what I’ve tried so far:

import tifffile

with tifffile.TiffFile('file') as tif:
    for i, page in enumerate(tif.pages):
        if some condition with tags is true:
            # Skip the page to delete or replace with a dummy page

        image_data = page.asarray(memmap=True)  # Memory-mapped access to the page's data

        # Write the page to the output file
        writer.write(
            image_data,
            compression='jpeg',
            photometric=page.photometric,
            metadata=page.tags,
        )

However, this approach has several issues:

Is there any way in Python to efficiently remove a single page from a large multi-page TIFF file without consuming too much memory or taking forever? I’ve seen some .NET packages that can delete a page in-place—does Python have a similar solution?


Solution

  • I've created a Python package to handle it. While it can be made more extensible, it efficiently solves the problem without loading all the image data into memory.

    Core Idea:

    The package works by:

    Installation: You can install the package directly from PyPI:

    pip install tiff-wsi-label-removal
    

    Usage: Once installed, you can use the remove-label command-line tool to remove labels from a TIFF file:

    remove-label <input_tiff_file> <output_tiff_file>
    

    Current Limitations and Future Work:

    The package is functional, but there’s room for improvement, including making it more extensible.

    Suggestions from the comments here and additional planned features are being tracked in the description of the package's PyPI page.

    Any feedback is welcome!