pythonscikit-learntime-seriesdecomposition

How to plot ICA components extracted from EEG signal?


I'm following the answer to this question and this scikit-learn tutorial to remove artifacts from an EEG signal. They seem simple enough, and I'm surely missing something obvious here.

The components extracted don't have the same length as my signal. I have 88 channels of several hours of recordings, so the shape of my signal matrix is (88, 8088516). Yet the output of ICA is (88, 88). In addition to being so short, each component seems to capture very large, noisy-looking deflections (so out of 88 components only a couple actually look like signal, the rest look like noise). I also would have expected only a few components to look noisy. I suspect I'm doing something wrong here?

The matrix of (channels x samples) has shape (88, 8088516).

enter image description here

Sample code (just using a random matrix for minimum working purposes):

import numpy as np
from sklearn.decomposition import FastICA
import matplotlib.pyplot as plt

samples_matrix = np.random.random((88, 8088516))

# Compute ICA
ica = FastICA(n_components=samples_matrix.shape[0])  # Extracting as many components as there are channels, i.e. 88
components = ica.fit_transform(samples_matrix)  # Reconstruct signals
A_ = ica.mixing_  # Get estimated mixing matrix

The shape of components is (88, 88). One plotted looks like this:

plt.plot(components[1])

enter image description here

I would have expected that the components be time series of the same length as my original, as shown in the answer to this question. I'm really not sure how to move forward with component removal and signal reconstruction at this point.


Solution

  • You need to run the fit_transform on the transpose of your samples_matrix instead of the samples_matrix itself (so provide a 8088516 x 88 matrix instead of an 88x8088516 to the method).

    ica.fit_transform(samples_matrix.transpose())
    

    or more simply

    ica.fit_transform(samples_matrix.T)
    

    which will give you an 8088516 x 88 set of signals (88 components, each of a signal as long as the original) for plotting. As I mentioned below in my comment, due to the large matrix inversions, etc., my setup recommended no more than 64 components.

    To back this suggestion up, I'm looking at your tutorial, they set up their toy problem as follows:

    n_samples = 2000
    time = np.linspace(0, 8, n_samples)
    
    s1 = np.sin(2 * time)  # Signal 1 : sinusoidal signal
    s2 = np.sign(np.sin(3 * time))  # Signal 2 : square signal
    s3 = signal.sawtooth(2 * np.pi * time)  # Signal 3: saw tooth signal
    
    S = np.c_[s1, s2, s3]
    S += 0.2 * np.random.normal(size=S.shape)  # Add noise
    
    S /= S.std(axis=0)  # Standardize data
    # Mix data
    A = np.array([[1, 1, 1], [0.5, 2, 1.0], [1.5, 1.0, 2.0]])  # Mixing matrix
    X = np.dot(S, A.T)  # Generate observations
    

    which gives an X.shape of (2000,3) to separate the 3 components, indicating that this is the preferred format of the matrix for the FastICA method.