pythonplotextrapolation

Getting a mean curve of several curves with x-values not being the same


I have several datasets containing many x- and y-values. An example with a lot fewer values would look something like this:

data_set1:

x1          y1        
---------   ---------   
0           100
0.0100523   65.1077
0.0201047   64.0519
0.030157    63.0341
0.0402094   62.1309
0.0502617   61.3649
0.060314    60.8614
0.0703664   60.3555
0.0804187   59.7635
0.0904711   59.1787

data_set2:

x2          y2        
---------   ---------   
0           100
0.01        66.119
0.02        64.4593
0.03        63.1377
0.04        62.0386
0.05        61.0943
0.06        60.2811
0.07        59.5603
0.08        58.8908

So here I have (for this example) two data sets containing 10 x- and y-values. The y-values are always different, but in some cases the x-values will be the same, and sometimes they will be different - as in this case. Not by a lot, but still, they are different. Plotting these two data sets into a graph yields two different curves, and I would now like to make a mean curve of both. If the x-values were the same I would just take the mean of the y-values and plot them against the x-values, but as stated, they are sometimes different, and sometimes the same. Is there some way to extrapolate, or something like that, so that I could average the values (again, for many data sets) without "just guessing" or saying "they are pretty much the same, so it will be okay just to average the y-values". Extrapolation seems like a plausible way of doing this, but I have never played with it in python, and maybe there are even better ways to do this ?


Solution

  • If you have the same number of points in each dataset (the example you have doesn't, but you state in your post that you do), you could just get the mean of the respective x values from each set, and the mean of the respective y values. If you do not have the same number of values, you could follow the answers in this post

    For example given your data, but with 9 points each:

    >>> x1
    array([0.       , 0.0100523, 0.0201047, 0.030157 , 0.0402094, 0.0502617,
           0.060314 , 0.0703664, 0.0804187])
    >>> y1
    array([100.    ,  65.1077,  64.0519,  63.0341,  62.1309,  61.3649,
            60.8614,  60.3555,  59.7635])
    >>> x2
    array([0.  , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08])
    >>> y2
    array([100.    ,  66.119 ,  64.4593,  63.1377,  62.0386,  61.0943,
            60.2811,  59.5603,  58.8908])
    

    You can do:

    import numpy as np
    
    mean_x = np.mean((x1,x2), axis=0)
    mean_y = np.mean((y1,y2), axis=0)
    

    And when to show visually, you can plot. Here, the black line is your mean line, and the blue and orange lines are your original datasets:

    import matplotlib.pyplot as plt
    plt.plot(x1,y1)
    plt.plot(x2,y2)
    plt.plot(mean_x,mean_y, color='black')
    plt.show()
    

    enter image description here