pythonlistmeanstdev

Python multiple lists of different lengths, averages and standard deviations


Given the array of lists below, i want to be able to create a new list, giving the average and standard deviation of the columns

a = [ [1, 2, 3],
      [2, 3, 4],
      [3, 4, 5, 6],
      [1, 2],
      [7, 2, 3, 4]]

Required result

mean =  2.8, 2.6, 3.75, 5
STDEV=  2.48997992, 0.894427191, 0.957427108, 1.414213562

I found the below example to give averages, which seems to work very well, but i wasnt clear how to adapt this for the standard deviation

import numpy as np
import numpy.ma as ma
from itertools import zip_longest

a = [ [1, 2, 3],
      [2, 3, 4],
      [3, 4, 5, 6],
      [1, 2],
      [7, 2, 3, 4]]


averages = [np.ma.average(ma.masked_values(temp_list, None)) for temp_list in zip_longest(*a)]


print(averages)

Solution

  • You can use these two lines:

    >>> np.nanmean(np.array(list(zip_longest(*a)),dtype=float),axis=1)
    array([2.8 , 2.6 , 3.75, 5.  ])
    
    >>> np.nanstd(np.array(list(zip_longest(*a)),dtype=float),axis=1,ddof=1)
    array([2.48997992, 0.89442719, 0.95742711, 1.41421356])
    

    nanmean and nanstd compute mean and std respectively, and ignoring nan. So you are passing it the array:

    >>> np.array(list(zip_longest(*a)),dtype=float)
    array([[ 1.,  2.,  3.,  1.,  7.],
           [ 2.,  3.,  4.,  2.,  2.],
           [ 3.,  4.,  5., nan,  3.],
           [nan, nan,  6., nan,  4.]])
    

    And computing the mean and standard deviation for each row, ignoring NaNs. The ddof argument stands for degrees of freedom, and I set it to 1 based on your desired output (default is 0)