pandasscipynanpearson

Dropping 'nan' with Pearson's r in scipy/pandas


Quick question: Is there a way to use 'dropna' with the Pearson's r function in scipy? I'm using it in conjunction with pandas, and some of my data has holes in it. I know you used to be able suppress 'nan' with Spearman's r in older versions of scipy, but that functionality is now missing.

To my mind, this seems like a disimprovement, so I wonder if I'm missing something obvious.

My code:

for i in range(len(frame3.columns)):    
    correlation.append(sp.pearsonr(frame3.iloc[ :,i], control['CONTROL']))

Solution

  • You can use np.isnan like this:

    for i in range(len(frame3.columns)):    
        x, y = frame3.iloc[ :,i].values, control['CONTROL'].values
        nas = np.logical_or(x.isnan(), y.isnan())
        corr = sp.pearsonr(x[~nas], y[~nas])
        correlation.append(corr)