My dataframe looks like this:
ID date var1 var2
0 1100289299522 2020-12-01 109.046450 8.0125
1 1100289299522 2020-12-02 104.494946 6.1500
2 1100289299522 2020-12-03 117.011582 5.9375
3 1100289299522 2020-12-04 109.615388 5.4750
4 1100289299522 2020-12-05 142.803438 3.8500
... ... ... ... ...
960045 21380318319578 2021-05-27 7.524261 15.4875
960046 21380318319578 2021-05-28 3.256770 17.3625
960047 21380318319578 2021-05-29 0.561512 18.3250
960048 21380318319578 2021-05-30 1.347629 18.7625
960049 21380318319578 2021-05-31 0.112302 20.0750
Is there a simple way in pandas to have one ID per row and set columns like this:
ID 2020-12-01_var1 2020-12-02_var1 ... 2021-05-31_var1 2020-12-01_var2 2020-12-02_var2 ... 2021-05-31 _var2
1100289299522 109.046450 104.494946 ... ___ 8.0125 6.1500 ... ___
Then i can use a dimensionality reduction algorithm (like TSNE) and maybe classify each time serie (and ID).
Do you think this is the correct way to proceed?
Try:
out = df.pivot(index='ID', columns='date', values=['var1', 'var2'])
out.columns = out.columns.to_flat_index().str.join('_')
For your sample:
>>> out
var1_2020-12-01 var1_2020-12-02 var1_2020-12-03 ... var2_2021-05-29 var2_2021-05-30 var2_2021-05-31
ID ...
1100289299522 109.04645 104.494946 117.011582 ... NaN NaN NaN
21380318319578 NaN NaN NaN ... 18.325 18.7625 20.075
[2 rows x 20 columns]