I have a pandas dataframe which looks like so:
time 000010 000017 000033 000034 000041 000042 \
0 672.246427 NaN NaN NaN 122.812927 367.110779 75.933125
1 672.253247 NaN NaN NaN 126.228996 372.775421 78.117798
2 672.260270 NaN NaN NaN 126.909046 369.460754 77.109196
3 672.267205 NaN NaN NaN 129.729416 376.499878 76.996864
4 672.274120 NaN NaN NaN 126.082420 380.343506 76.199158
5 672.281085 NaN NaN NaN 127.412136 387.227203 78.589165
6 672.288012 NaN NaN NaN 131.672180 394.507355 83.319740
7 672.294974 NaN NaN NaN 128.294861 390.472992 78.814026
8 672.301931 NaN NaN NaN 134.104858 393.601486 82.421974
9 672.308877 NaN NaN NaN 119.213364 393.934875 80.444237
10 672.315816 NaN NaN NaN 126.745148 378.437531 79.340736
11 672.322750 NaN NaN NaN 114.940750 367.477142 76.719002
12 672.329622 NaN NaN NaN 118.000877 364.089691 74.932938
which I intend to use with the module 'tsfresh' to extract features. The numbered column headers are object ID's and the time column is the time series.
This data frame is called 'data' and so I'm trying to use the extract features command:
extracted_features = extract_features(data, column_id = objs[1:], column_sort = "time")
where objs[1:] here are the object ID's to the right of the column header "time".
This errors out with 'The truth value of an array with more than one element is ambiguous', but can anyone help me make this work and extract a nice pandas dataframe of features?
Many thanks in advance!
Maybe I misunderstood your question, but if I understood correctly, you need to reorder your dataframe in a form, tsfresh can understand it.
The column_id
assumes (as its name suggest) a column name with the ID column - which you do not have. I think, you only have 6 different IDs (000010, 000017, 000033, 000034, 000041, 000042) each of which with 13 time series float values(lets call it data
). So tsfresh wants to have a dataframe, which looks like
id kind value time
000034 data 122.812927 672.246427
...
000041 data 367.110779 672.246427
...
You can then feed this into tsfresh using:
extract_features(df, column_id="id", column_kind="kind",
column_value="value", column_sort="time")
Also, you need to get rid of the NaN columns (because tsfresh can not know how to handle them).
Please have a look into our documentation on the data format: http://tsfresh.readthedocs.io/en/latest/text/data_formats.html