I have several datasets containing many x- and y-values. An example with a lot fewer values would look something like this:
data_set1
:
x1 y1
--------- ---------
0 100
0.0100523 65.1077
0.0201047 64.0519
0.030157 63.0341
0.0402094 62.1309
0.0502617 61.3649
0.060314 60.8614
0.0703664 60.3555
0.0804187 59.7635
0.0904711 59.1787
data_set2
:
x2 y2
--------- ---------
0 100
0.01 66.119
0.02 64.4593
0.03 63.1377
0.04 62.0386
0.05 61.0943
0.06 60.2811
0.07 59.5603
0.08 58.8908
So here I have (for this example) two data sets containing 10 x- and y-values. The y-values are always different, but in some cases the x-values will be the same, and sometimes they will be different - as in this case. Not by a lot, but still, they are different. Plotting these two data sets into a graph yields two different curves, and I would now like to make a mean curve of both. If the x-values were the same I would just take the mean of the y-values and plot them against the x-values, but as stated, they are sometimes different, and sometimes the same. Is there some way to extrapolate, or something like that, so that I could average the values (again, for many data sets) without "just guessing" or saying "they are pretty much the same, so it will be okay just to average the y-values". Extrapolation seems like a plausible way of doing this, but I have never played with it in python, and maybe there are even better ways to do this ?
If you have the same number of points in each dataset (the example you have doesn't, but you state in your post that you do), you could just get the mean of the respective x
values from each set, and the mean of the respective y
values. If you do not have the same number of values, you could follow the answers in this post
For example given your data, but with 9 points each:
>>> x1
array([0. , 0.0100523, 0.0201047, 0.030157 , 0.0402094, 0.0502617,
0.060314 , 0.0703664, 0.0804187])
>>> y1
array([100. , 65.1077, 64.0519, 63.0341, 62.1309, 61.3649,
60.8614, 60.3555, 59.7635])
>>> x2
array([0. , 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08])
>>> y2
array([100. , 66.119 , 64.4593, 63.1377, 62.0386, 61.0943,
60.2811, 59.5603, 58.8908])
You can do:
import numpy as np
mean_x = np.mean((x1,x2), axis=0)
mean_y = np.mean((y1,y2), axis=0)
And when to show visually, you can plot. Here, the black line is your mean line, and the blue and orange lines are your original datasets:
import matplotlib.pyplot as plt
plt.plot(x1,y1)
plt.plot(x2,y2)
plt.plot(mean_x,mean_y, color='black')
plt.show()