pythonopencvtiffpython-imaging-libraryvips

Extract tiles from tiled TIFF and store in numpy array


My overall goal is to crop several regions from an input mirax (.mrxs) slide image to JPEG output files.

Here is what one of these images looks like:

enter image description here

Note that the darker grey area is part of the image, and the regions I ultimately wish to extract in JPEG format are the 3 black square regions.

Now, for the specifics:

I'm able to extract the color channels from the mirax image into 3 separate TIFF files using vips on the command line:

vips extract_band INPUT.mrxs OUTPUT.tiff[tile,compression=jpeg] C --n 1

Where C corresponds to the channel number (0-2), and each output file is about 250 MB in size.

The next job is to somehow recognize and extract the regions of interest from the images, so I turned to several python imaging libraries, and this is where I encountered difficulties.

When I try to load any of the TIFFs using OpenCV using:

i = cv2.imread('/home/user/input_img.tiff',cv2.IMREAD_ANYDEPTH) 

I get an error error: (-211) The total matrix size does not fit to "size_t" type in function setSize

I managed to get a little more traction with Pillow, by doing:

from PIL import Image
tiff = Image.open('/home/user/input_img.tiff')
print len(tiff.tile)
print tiff.tile[0]
print tiff.info

which outputs:

636633
('jpeg', (0, 0, 128, 128), 8, ('L', ''))
{'compression': 'jpeg', 'dpi': (25.4, 25.4)}

However, beyond loading the image, I can't seem to perform any useful operations; for example doing tiff.tostring() results in a MemoryError (I do this in an attempt to convert the PIL object to a numpy array) I'm not sure this operation is even valid given the existence of tiles.

From my limited understanding, these TIFFs store the image data in 'tiles' (of which the above image contains 636633) in a JPEG-compressed format.

It's not clear to me, however, how would one would extract these tiles for use as regular JPEG images, or even whether the sequence of steps in the above process I outlined is a potentially useful way of accomplishing the overall goal of extracting the ROIs from the mirax image.

If I'm on the right track, then some guidance would be appreciated, or, if there's another way to accomplish my goal using vips/openslide without python I would be interested in hearing ideas. Additionally, more information about how I could deal with or understand the TIFF files I described would also be helpful.

The ideal situations would include:

1) Some kind of autocropping feature in vips/openslide which can generate JPEGs from either the TIFFs or original mirax image, along the lines of what the following command does, but without generated tens of thousands of images:

vips dzsave CMU-1.mrxs[autocrop] pyramid

2) Being able to extract tiles from the TIFFs and store the data corresponding to the image region as a numpy array in order to detect the 3 ROIs using OpenCV or another methd.


Solution

  • I would use the vips Python binding, it's very like PIL but can handle these huge images. Try something like:

    import pyvips
    
    slide = pyvips.Image.new_from_file("my-slide.mrxs", rgb=True)
    tile = slide.crop(left, top, width, height)
    tile.write_to_file("tile.jpg")
    

    You can also extract areas on the command-line, of course:

    $ vips crop my-slide.mrxs[rgb] tile.jpg left top width height
    

    openslide attaches a lot of metadata to the image describing the layout and position of the various subimages. Try:

    $ vipsheader -a myslide.mrxs 
    

    And have a look through the output. You might be able to calculate the position of your subimages from that. I would also ask on the openslide mailing list, they are expert and helpful.

    One more thing you could try: get a low-res overview, corner-detect on that, then extract the tiles from the high-res image. To get a low-res version of your slide, try:

    $ vips copy my-slide.mrxs[level=7] overview.jpg
    

    Level 7 is downsampled by 2 ** 7, so 128x.