pythonpandasinterpolationreindex

Reconcile channels with varying data acquisition frequencies


My dataframe has 8 columns for 4 different data recorded as a function of depth. However, each channel came with a different acquisition frequency. I would like to reconcile all these 4 data using a common depth interval.

Interpolation with Scipy looked complex, so is there any easier method to, for example, apply the first depth ('depth1') to all other data?

Here is a data sample:

df=pd.read_csv(file,sep='\t')
df
depth1  data1   depth2  data2   depth3  data3   depth4  data4
0   910.0   32.820  910 48.2    910.05  450.57  912.961414  -294.045478
1   910.1   33.610  911 48.2    910.20  1.14    922.966707  -447.780089
2   910.2   33.900  912 48.2    910.35  1.14    932.972000  -396.001844
3   910.3   34.190  913 48.4    910.50  1.43    942.976616  -391.830800
4   910.4   34.430  914 48.7    910.65  1.32    952.980427  -438.514022
5   910.5   34.670  915 48.9    910.80  1.54    962.984317  -679.421100
6   910.6   35.015  916 48.8    910.95  16.08   972.988514  -660.389044
7   910.7   35.360  917 49.0    911.10  8.16    982.993188  -671.841567
8   910.8   35.450  918 49.5    911.25  7.67    992.998200  -712.625933
9   910.9   35.540  919 49.4    911.40  8.86    1003.004001 -884.093533
10  911.0   35.825  920 49.5    911.55  8.70    1013.009802 -1124.780022
11  911.1   36.110  921 49.6    911.70  7.93    1023.015603 -1454.342144

Solution

  • The method of choice will remain reindex - interpolate indeed and it is not that complex. You don't have to use Scipy for that (the code that you transiently posted) since Pandas already has some built-in capabilities. Here the example of channel ("depth2", "data2") reindexed over the axis "depth1":

    Craft the new index

    Idx1 = pd.Index(df.set_index('depth1').index) # new index for data2
    Idx2 = pd.Index(df.set_index('depth2').index) # original data2 index
    
    # Merge indexes depth1 and depth2
    NewIdx = pd.Index(Idx2.union(Idx1), 
                      name = 'depth')
    

    Output:

    Index([910.0, 910.1, 910.2, 910.3, 910.4, 910.5, 910.6, 910.7, 910.8, 910.9,
           911.0, 911.1, 912.0, 913.0, 914.0, 915.0, 916.0, 917.0, 918.0, 919.0,
           920.0, 921.0],
          dtype='float64', name='depth')
    

    Operate the reindexing

    df2 = df[['depth2', 'data2']].set_index('depth2'    # the subset of df to be reindexed,
                         ).reindex(NewIdx               # reindexed, as a new dataframe
                         ).interpolate(method='linear') # gaps filled by interpolation 
    

    Finally select interpolated data only

    df2.loc[Idx1]

    This will plot the results:

    # Plot original and interpolated data separately:
    _=plt.plot(df2.reindex(Idx2).index, 
               df2.reindex(Idx2).values,'+')
    _=plt.plot(df2.reindex(Idx1).index, 
               df2.reindex(Idx1).values,'x')
    _=plt.xlim(909.9, 912.1)
    _=plt.xlabel('depth')
    _=plt.ylabel('data')
    plt.legend(['original data2','interpolated along depth1'])