pythonindexingnannumpy-ndarraynumpy-slicing

removing Nans from a 3D array without reshaping my data


I have a 3D array (121, 512, 1024) made up of frames of 512x1024 images. The bottom several rows of the images have Nans which mess up my processing. I want to remove these and end up with something that looks like (121, 495, 1024).

I have been trying variations of

arr[:, ~np.isnan(arr).any(0)]

but I end up with (121, 508252).

I also tried arr[np.argwhere(~np.isnan(arr)]

but I got a Memory error.

This seems like a simple and common task, but all the examples I have been able to find are for 2D arrays. Any help would be appreciated.

edit: Alternatively, if I do reshape, I need it to automatically detect what the new shape should be. Depending on the transformations on the images, trimming the Nans could lead to an array of different shapes (121, 495, 1000) or whatever. But in one image stack (for example the 121 frames) all images will be identical shapes, so making the array will be legal.

The problem is I cannot predict if the nans are in a straight line, at the top of the image or sides or bottom (but should be edges). I basically have to cut more into the image to get a straight row/column, I just need to figure out where the new edges should be.

arr = np.ones([121, 512, 1024])
arr[:,497:512,0:675] = np.nan
arr[:,496:512,676:1024] = np.nan

#try 3 
trim = np.argwhere(~np.isnan(arr))
rows = np.unique(trim[:,1])
cols = np.unique(trim[:,2])
result_arr = arr[:, ~np.isnan(arr).all(0)]
result_arr.shape = -1, len(rows), len(cols)

For example this does not work because there is a row mismatch at column 675, so I have the wrong number of cells for these dimensions.


Solution

  • This problem can be solved by identifying the valid rows and columns across all frames, ensuring the resulting array shape remains consistent. Here's how you can do it:

    1. Find the rows and columns that do not contain NaN values across all frames.
    2. Trim the array to only include these rows and columns.

    Code explanation:

    1. np.isnan(arr).any(axis=(0, 2)) checks for NaN values across all frames and columns for each row. The result is a boolean array indicating whether a row contains any NaNs.
    2. Similarly, np.isnan(arr).any(axis=(0, 1)) identifies columns containing NaNs.
    3. Using these boolean masks, the original array is sliced to retain only the valid rows and columns.
    4. The resulting array has consistent dimensions across all frames.
    import numpy as np
    
    # Create the example array
    arr = np.ones([121, 512, 1024])
    arr[:, 497:512, 0:675] = np.nan
    arr[:, 496:512, 676:1024] = np.nan
    
    # Find valid rows and columns
    valid_rows = ~np.isnan(arr).any(axis=(0, 2))  # Rows without any NaNs
    valid_cols = ~np.isnan(arr).any(axis=(0, 1))  # Columns without any NaNs
    
    # Trim the array
    trimmed_arr = arr[:, valid_rows, :][:, :, valid_cols]
    
    # Check the new shape
    print(trimmed_arr.shape)
    

    The shape of the trimmed_arr will match your expectations, excluding rows and columns with NaN values. For the given example, the shape will likely be something like (121, 495, 1000).