pandastimeseriestsne

Data manipulation with time series


My dataframe looks like this:

         ID             date         var1         var2
0        1100289299522  2020-12-01   109.046450   8.0125
1        1100289299522  2020-12-02   104.494946   6.1500
2        1100289299522  2020-12-03   117.011582   5.9375
3        1100289299522  2020-12-04   109.615388   5.4750
4        1100289299522  2020-12-05   142.803438   3.8500
...                ...         ...          ...      ...
960045  21380318319578  2021-05-27     7.524261  15.4875
960046  21380318319578  2021-05-28     3.256770  17.3625
960047  21380318319578  2021-05-29     0.561512  18.3250
960048  21380318319578  2021-05-30     1.347629  18.7625
960049  21380318319578  2021-05-31     0.112302  20.0750

Is there a simple way in pandas to have one ID per row and set columns like this:

ID             2020-12-01_var1  2020-12-02_var1 ...  2021-05-31_var1  2020-12-01_var2  2020-12-02_var2 ...  2021-05-31 _var2
1100289299522  109.046450       104.494946      ...  ___              8.0125           6.1500          ...  ___

Then i can use a dimensionality reduction algorithm (like TSNE) and maybe classify each time serie (and ID).

Do you think this is the correct way to proceed?


Solution

  • Try:

    out = df.pivot(index='ID', columns='date', values=['var1', 'var2'])
    out.columns = out.columns.to_flat_index().str.join('_')
    

    For your sample:

    >>> out
                    var1_2020-12-01  var1_2020-12-02  var1_2020-12-03  ...  var2_2021-05-29  var2_2021-05-30  var2_2021-05-31
    ID                                                                 ...
    1100289299522         109.04645       104.494946       117.011582  ...              NaN              NaN              NaN
    21380318319578              NaN              NaN              NaN  ...           18.325          18.7625           20.075
    
    [2 rows x 20 columns]