pythonnumpyaveragestdev

Calculate Lowest Ave and STDEV from N-1 Sample


I need help coding a function in python for calculating average and standard deviation from N-1 samples.

I have 96 rows of quarduplicate samples: total of 384 samples in 96x4 numpy arrays.

For each row, I would like to:

  1. Take out one sample in quadruplicates so they become triplicates

    [30,38,23,21] becomes [38,23,21]
    
  2. Calculate mean and standard deviation of those triplicate samples

    mean = 27.33, stdev = 9.29
    
  3. Put back that sample so they are quadruplicates again

    [38,23,21] becomes [30,38,23,21]
    
  4. Repeat Step 1-3 three more times taking out other sample each time

    [30,23,21]: mean = 24.67, stdev = 4.73
    [30,38,21]: mean = 29.67, stdev = 8.50
    [30,38,23]: mean = 30.33, stdev = 7.51
    
  5. Find the average with the lowest standard deviation among those calculated data

    [30,23,21]: mean = 24.67, stdev = 4.73
    
  6. Move on to next row and repeat Step 1-4

  7. Output is a 96x1 array with found average for each corresponding row

Basically I want to calculate mean and standard deviation under the assumption of one of quadruplicates is an outlier.

I tried coding a function with nested for-loops but it became too long and ugly. I need an advice for smarter way.


Solution

  • I came up with the following:

    import numpy as np
    
    def bestMean(rows):
        bestMeans = []
        for row in rows:
            mean = [np.mean(row[:k] + row[k+1:]) for k in xrange(len(row))]
            std = [np.std(row[:k] + row[k+1:]) for k in xrange(len(row))]
            bestMeans.append((mean[np.argmin(std)], np.min(std)))
        return bestMeans
    

    I did a quick test and it seemed to work. Note though, that this isn't the fastest option out there but quite readable.