pythonarraysnumpynan

numpy array: replace nan values with average of columns


I've got a numpy array filled mostly with real numbers, but there is a few nan values in it as well.

How can I replace the nans with averages of columns where they are?


Solution

  • No loops required:

    print(a)
    [[ 0.93230948         nan  0.47773439  0.76998063]
     [ 0.94460779  0.87882456  0.79615838  0.56282885]
     [ 0.94272934  0.48615268  0.06196785         nan]
     [ 0.64940216  0.74414127         nan         nan]]
    
    #Obtain mean of columns as you need, nanmean is convenient.
    col_mean = np.nanmean(a, axis=0)
    print(col_mean)
    [ 0.86726219  0.7030395   0.44528687  0.66640474]
    
    #Find indices that you need to replace
    inds = np.where(np.isnan(a))
    
    #Place column means in the indices. Align the arrays using take
    a[inds] = np.take(col_mean, inds[1])
    
    print(a)
    [[ 0.93230948  0.7030395   0.47773439  0.76998063]
     [ 0.94460779  0.87882456  0.79615838  0.56282885]
     [ 0.94272934  0.48615268  0.06196785  0.66640474]
     [ 0.64940216  0.74414127  0.44528687  0.66640474]]