scikit-learnpcanilearn

Plot PCA components derived from sklearn.decomposition.PCA separately in three dimensional space


For my project, I work with three dimensional MRI data, where the fourth dimension represents different subjects (I use the package nilearn for this). I am using sklearn.decomposition.PCA to extract a given number of principal components from my data. Now I would like to plot the components separately on a brain image, that is, I would like to show a brain image with my extracted components (in this case, 2) in different colors.

Here’s an example code using the OASIS dataset, which can be downloaded via the nilearn API:

  1. masking using nilearn.input_data.NiftiMasker, which converts my 4 dimensional data into a 2 dimesional array (n_subjects x n_voxels).
  2. standardizing the data matrix using StandardScaler
  3. running the PCA using sklearn.decomposition.PCA:
## set workspace
import numpy as np

from nilearn.datasets import fetch_oasis_vbm
from nilearn.input_data import NiftiMasker
from nilearn.image import index_img

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

from nilearn import plotting

## Load Data  #################################################################

# take only first 30 subjects as example
oasis_dataset = fetch_oasis_vbm(n_subjects=30)
imgs = np.array(oasis_dataset['gray_matter_maps'])

## PIPELINE ###################################################################

# create a random number generator
rng = np.random.RandomState(42)

# Convert Images to 2D Data Array 
niftimasker = NiftiMasker(mask_strategy='template')

# z-standardize images
scaler = StandardScaler()

# Extract 2 Components
pca = PCA(n_components=2,
          svd_solver='full',
          random_state=rng)

# create pipeline
pipe = Pipeline([('niftimasker',niftimasker),
                 ('scaler',scaler),
                 ('pca',pca)])

# call fit_transform on pipeline
X = pipe.fit_transform(imgs)

As far as I understand what I obtain after running the PCA are the PCA loadings? Unfortunately, I don't understand how to get from this to two images, each containing one PCA component.


Solution

  • To get data back in to image format, you will need to do a NiftiMasker.inverse_transform(). To do so it is required that you preserve the dimensions in voxel space.

    So, the way the pipeline is working now, you are using dimensionality reduction on voxel space. Just in case you wanted to reduce dimensionality in subject space, you would change the following:

    pipe = Pipeline([('niftimasker',niftimasker),
                 ('scaler',scaler),
    #                  ('pca',pca)
                ])
    
    X = pipe.fit_transform(imgs)
    X_reduced = pca.fit_transform(X.T).T
    

    Then you would apply an inverse transform as follows:

    component_image = niftimasker.inverse_transform(X_reduced)
    

    Then to get each individual subject component image you will use index_image from nilearn.image. E.g. this is the image for the first subject component:

    component1_image = index_img(component_image,0)
    

    However, I think you are interested in reducing diemnsionality on voxel space. Therefore to preserve the voxel dimensions for the inverse transform, you will need to get the index of each voxel feature chosen in the PCA dimensionality reduction. Keep your pipeline the way you had it originally and do the following:

    X = pipe.fit_transform(imgs)
    
    components = pca.components_
    #In your case 2, but replace range(2) with range(n_components)
    most_important = [np.abs(components[i]).argmax() for i in range(2)]
    

    Then tile up nan arrays with x subjects and y voxels: (in your case 30 x 229007)

    comp1, comp2 = np.tile(np.nan, [30,229007]), np.tile(np.nan, [30,229007])
    for x,y in enumerate(X):
        comp1[x,most_important[0]] = y[0]
        comp2[x,most_important[1]] = y[1]
    

    Then apply the reverse transform on each component:

    component1_image = niftimasker.inverse_transform(comp1)
    component2_image = niftimasker.inverse_transform(comp2)
    

    You will now have 2 images, each with 30 subjects and 1 valid voxel value representing the chosen component. It is up to you how to aggregate the component voxel over the 30 subjects, in this case I am just going to use a mean image function from nilearn.image:

    mean_component1_image = mean_img(component1_image)
    mean_component2_image = mean_img(component2_image)
    

    Finally, in both cases plot the respective image. In the voxel reduced version you will see a small variation in the two images in the X dimension (second diagram) but hardly the Y and Z. I'm using plot_glass_brain from nilearn.plotting:

    plotting.plot_glass_brain(mean_component1_image)
    plotting.plot_glass_brain(mean_component2_image)
    

    To use overlays, adjust color maps to make this easier to visualize, and other plotting options consult this and other nilearn plotting guides:

    https://nilearn.github.io/plotting/index.html#different-display-modes

    Let me know if you have any more questions.