python-3.xnumpyscipystatisticsgeometric-mean

How to run scipy.stats.gmean for rows contanining values less than 1 and zeros?


We have the following dataframe (df)

print(df)

 #Gene  GSM772  GSM773  GSM774  GSM775  GSM776
0610007P14Rik    0.003485    0.003415    0.005431    0.003667    0.007146
0610009B22Rik    0.001220    0.001351    0.001762    0.001404    0.002177
0610009L18Rik    0.000055    0.000009    0.000152    0.000082    0.000179
0610009O20Rik    0.000000    0.006830    00000000    0.006653    0.006907
0610010F05Rik    0.008310    0.008329    0.007091    0.006919    0.006915

We want to calculate Geometric Mean for every row.

For some rows there are "zero" values, which needs to be ignored so the geometric mean for that row is regarded as zero.

We wrote the following python script,

import scipy
import numpy
import numpy as np
from scipy.stats.mstats import gmean
from scipy import stats

numpy.seterr(divide = 'ignore') 
scipy.stats.gmean(df.iloc[:,1:5],axis=1)

gmean = scipy.stats.gmean(df.iloc[:,1:5],axis=1)

df.assign(GeometricMean=gmean)
results = df.assign(GeometricMean=gmean)

print(results)

Can anyone please suggest the best way to resolve this issue?

Thanks !!


Solution

  • Problem solved. Actually, the above script works without any issue. Sorry, this question was posted without hindsight. We cannot delete any question, so this will stay here. Hope the script is useful for someone.

    Note, that this script will not work if the dataframe contains any column with strings. After removing those columns, this script will work without any problem in generating the last column with geometric mean for every row.

    print(df.shape)
    
    (5, 6)
    
    
    
    print(df)
    
    
               #Gene  GSM772  GSM773  GSM774  GSM775  GSM776
    0  0610007P14Rik    0.003485    0.003415    0.005431    0.003667    0.007146
    1  0610009B22Rik    0.001220    0.001351    0.001762    0.001404    0.002177
    2  0610009L18Rik    0.000055    0.000009    0.000152    0.000082    0.000179
    3  0610009O20Rik    0.006369    0.006830    0.007176    0.006653    0.006907
    4  0610010F05Rik    0.008310    0.008329    0.007091    0.006919    0.006915
    
    
    print(results)
    
               #Gene  GSM772  GSM773  GSM774  GSM775  GSM776  GeometricMean
    0  0610007P14Rik    0.003485    0.003415    0.005431    0.003667    0.007146       0.004424
    1  0610009B22Rik    0.001220    0.001351    0.001762    0.001404    0.002177       0.001548
    2  0610009L18Rik    0.000055    0.000009    0.000152    0.000082    0.000179       0.000064
    3  0610009O20Rik    0.006369    0.006830    0.007176    0.006653    0.006907       0.006782
    4  0610010F05Rik    0.008310    0.008329    0.007091    0.006919    0.006915       0.007484