pythontime-seriesframeambiguous

Features dataframe with tsfresh


I have a pandas dataframe which looks like so:

            time  000010  000017  000033      000034      000041     000042  \

0     672.246427     NaN     NaN     NaN  122.812927  367.110779  75.933125   
1     672.253247     NaN     NaN     NaN  126.228996  372.775421  78.117798   
2     672.260270     NaN     NaN     NaN  126.909046  369.460754  77.109196   
3     672.267205     NaN     NaN     NaN  129.729416  376.499878  76.996864   
4     672.274120     NaN     NaN     NaN  126.082420  380.343506  76.199158   
5     672.281085     NaN     NaN     NaN  127.412136  387.227203  78.589165   
6     672.288012     NaN     NaN     NaN  131.672180  394.507355  83.319740   
7     672.294974     NaN     NaN     NaN  128.294861  390.472992  78.814026   
8     672.301931     NaN     NaN     NaN  134.104858  393.601486  82.421974   
9     672.308877     NaN     NaN     NaN  119.213364  393.934875  80.444237   
10    672.315816     NaN     NaN     NaN  126.745148  378.437531  79.340736   
11    672.322750     NaN     NaN     NaN  114.940750  367.477142  76.719002   
12    672.329622     NaN     NaN     NaN  118.000877  364.089691  74.932938

which I intend to use with the module 'tsfresh' to extract features. The numbered column headers are object ID's and the time column is the time series.

This data frame is called 'data' and so I'm trying to use the extract features command:

extracted_features = extract_features(data, column_id = objs[1:], column_sort = "time")

where objs[1:] here are the object ID's to the right of the column header "time".

This errors out with 'The truth value of an array with more than one element is ambiguous', but can anyone help me make this work and extract a nice pandas dataframe of features?

Many thanks in advance!


Solution

  • Maybe I misunderstood your question, but if I understood correctly, you need to reorder your dataframe in a form, tsfresh can understand it.

    The column_id assumes (as its name suggest) a column name with the ID column - which you do not have. I think, you only have 6 different IDs (000010, 000017, 000033, 000034, 000041, 000042) each of which with 13 time series float values(lets call it data). So tsfresh wants to have a dataframe, which looks like

      id     kind  value       time
    000034   data  122.812927  672.246427
    ...
    000041   data  367.110779  672.246427   
    ...
    

    You can then feed this into tsfresh using:

    extract_features(df, column_id="id", column_kind="kind", 
                      column_value="value", column_sort="time")
    

    Also, you need to get rid of the NaN columns (because tsfresh can not know how to handle them).

    Please have a look into our documentation on the data format: http://tsfresh.readthedocs.io/en/latest/text/data_formats.html