My dataframe has 8 columns for 4 different data recorded as a function of depth. However, each channel came with a different acquisition frequency. I would like to reconcile all these 4 data using a common depth interval.
Interpolation with Scipy looked complex, so is there any easier method to, for example, apply the first depth ('depth1'
) to all other data?
Here is a data sample:
df=pd.read_csv(file,sep='\t')
df
depth1 data1 depth2 data2 depth3 data3 depth4 data4
0 910.0 32.820 910 48.2 910.05 450.57 912.961414 -294.045478
1 910.1 33.610 911 48.2 910.20 1.14 922.966707 -447.780089
2 910.2 33.900 912 48.2 910.35 1.14 932.972000 -396.001844
3 910.3 34.190 913 48.4 910.50 1.43 942.976616 -391.830800
4 910.4 34.430 914 48.7 910.65 1.32 952.980427 -438.514022
5 910.5 34.670 915 48.9 910.80 1.54 962.984317 -679.421100
6 910.6 35.015 916 48.8 910.95 16.08 972.988514 -660.389044
7 910.7 35.360 917 49.0 911.10 8.16 982.993188 -671.841567
8 910.8 35.450 918 49.5 911.25 7.67 992.998200 -712.625933
9 910.9 35.540 919 49.4 911.40 8.86 1003.004001 -884.093533
10 911.0 35.825 920 49.5 911.55 8.70 1013.009802 -1124.780022
11 911.1 36.110 921 49.6 911.70 7.93 1023.015603 -1454.342144
The method of choice will remain reindex - interpolate indeed and it is not that complex. You don't have to use Scipy for that (the code that you transiently posted) since Pandas already has some built-in capabilities. Here the example of channel ("depth2", "data2")
reindexed over the axis "depth1"
:
Craft the new index
Idx1 = pd.Index(df.set_index('depth1').index) # new index for data2
Idx2 = pd.Index(df.set_index('depth2').index) # original data2 index
# Merge indexes depth1 and depth2
NewIdx = pd.Index(Idx2.union(Idx1),
name = 'depth')
Output:
Index([910.0, 910.1, 910.2, 910.3, 910.4, 910.5, 910.6, 910.7, 910.8, 910.9,
911.0, 911.1, 912.0, 913.0, 914.0, 915.0, 916.0, 917.0, 918.0, 919.0,
920.0, 921.0],
dtype='float64', name='depth')
Operate the reindexing
df2 = df[['depth2', 'data2']].set_index('depth2' # the subset of df to be reindexed,
).reindex(NewIdx # reindexed, as a new dataframe
).interpolate(method='linear') # gaps filled by interpolation
Finally select interpolated data only
df2.loc[Idx1]
This will plot the results:
# Plot original and interpolated data separately:
_=plt.plot(df2.reindex(Idx2).index,
df2.reindex(Idx2).values,'+')
_=plt.plot(df2.reindex(Idx1).index,
df2.reindex(Idx1).values,'x')
_=plt.xlim(909.9, 912.1)
_=plt.xlabel('depth')
_=plt.ylabel('data')
plt.legend(['original data2','interpolated along depth1'])