pythonfeaturetools

Featuretools: Can it be applied on a single table to generate features even when there is no datetime related column?


The featuretools documentation states in its very first sentence:

"Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning."

This seems to imply that dataset must have a datetime column. I just want to have it confirmed that this is actually so. That is, for example, I cannot use it on 'iris' dataset to generate new features? If dataset need not have time variable, how would I use it to generate features on 'iris' dataset. I will be grateful for a reply. Thanks.


Solution

  • Featuretools works for relational datasets with or without datetimes and in answer to your question, Featuretools can make features for a single table without a datetime. For the iris dataset, there is only a single table and no immediate features on which to normalize (make a new table from an existing table), so you would use transform primitives to make new features.

    1. Make an EntitySet
    2. Add a single entity
    3. Run Deep Feature Synthesis using whichever transform primitives you want.

    Here is a complete working example:

    from sklearn.datasets import load_iris
    import pandas as pd
    import featuretools as ft
    
    # Load data and put into dataframe
    iris = load_iris()
    df = pd.DataFrame(iris.data, columns = iris.feature_names)
    df['species'] = iris.target
    df['species'] = df['species'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})
    # Make an entityset and add the entity
    es = ft.EntitySet(id = 'iris')
    es = es.add_dataframe(
          dataframe_name="data",
          dataframe=df,
          index="index",
    )
    
    # Run deep feature synthesis with transformation primitives
    feature_matrix, feature_defs = ft.dfs(entityset = es, target_dataframe_name = 'data',
                                          trans_primitives = ['add_numeric', 'multiply_numeric'])
    
    feature_matrix.head()
    

    First five rows of feature matrix:

                      sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm) species  petal width (cm) + sepal width (cm)  petal length (cm) + petal width (cm)  petal length (cm) + sepal length (cm)  petal length (cm) + sepal width (cm)  sepal length (cm) + sepal width (cm)  petal width (cm) + sepal length (cm)  petal length (cm) * sepal width (cm)  sepal length (cm) * sepal width (cm)  petal width (cm) * sepal length (cm)  petal width (cm) * sepal width (cm)  petal length (cm) * sepal length (cm)  petal length (cm) * petal width (cm)  petal width (cm) + sepal width (cm) * sepal length (cm) + sepal width (cm)  petal width (cm) + sepal width (cm) * sepal length (cm)  petal length (cm) + petal width (cm) * petal width (cm)  petal width (cm) + sepal length (cm) * sepal length (cm)  petal length (cm) * petal width (cm) + sepal length (cm)  petal width (cm) * sepal length (cm) + sepal width (cm)  petal length (cm) + sepal length (cm) * sepal width (cm)  petal length (cm) + petal width (cm) * sepal length (cm)  petal length (cm) + sepal length (cm) * petal width (cm) + sepal width (cm)  petal length (cm) + sepal length (cm) * sepal length (cm) + sepal width (cm)  petal length (cm) * sepal length (cm) + sepal width (cm)  petal length (cm) + sepal width (cm) * sepal length (cm) + sepal width (cm)  petal length (cm) + sepal width (cm) * sepal length (cm)  petal length (cm) + petal width (cm) * petal length (cm) + sepal length (cm)  petal length (cm) + sepal length (cm) * petal width (cm)  petal length (cm) + sepal width (cm) * sepal width (cm)  petal length (cm) + petal width (cm) * petal length (cm) + sepal width (cm)  sepal length (cm) + sepal width (cm) * sepal width (cm)  petal length (cm) + sepal length (cm) * petal width (cm) + sepal length (cm)  petal width (cm) + sepal length (cm) * sepal length (cm) + sepal width (cm)  petal length (cm) + petal width (cm) * petal width (cm) + sepal width (cm)  petal length (cm) + sepal width (cm) * petal width (cm) + sepal length (cm)  petal length (cm) * petal length (cm) + sepal length (cm)  petal width (cm) * petal width (cm) + sepal width (cm)  petal length (cm) + petal width (cm) * sepal length (cm) + sepal width (cm)  petal length (cm) * petal width (cm) + sepal width (cm)  petal length (cm) + sepal width (cm) * petal width (cm)  petal length (cm) * petal length (cm) + petal width (cm)  petal length (cm) + petal width (cm) * sepal width (cm)  petal length (cm) + sepal length (cm) * sepal length (cm)  petal width (cm) + sepal length (cm) * sepal width (cm)  petal length (cm) * petal length (cm) + sepal width (cm)  petal width (cm) + sepal width (cm) * sepal width (cm)  petal length (cm) + sepal length (cm) * petal length (cm) + sepal width (cm)  sepal length (cm) * sepal length (cm) + sepal width (cm)  petal width (cm) * petal width (cm) + sepal length (cm)  petal length (cm) + sepal width (cm) * petal width (cm) + sepal width (cm)  petal length (cm) + petal width (cm) * petal width (cm) + sepal length (cm)  petal width (cm) + sepal length (cm) * petal width (cm) + sepal width (cm)
        index                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 
        0                    5.1               3.5                1.4               0.2  setosa                                  3.7                                   1.6                                    6.5                                   4.9                                   8.6                                   5.3                                  4.90                                 17.85                                  1.02                                 0.70                                   7.14                                  0.28                                              31.82                                                                       18.87                                                     0.32                                                    27.03                                                      7.42                                                      1.72                                                    22.75                                                      8.16                                                     24.05                                                                        55.90                                                                         12.04                                                     42.14                                                                        24.99                                                     10.40                                                                          1.30                                                     17.15                                                     7.84                                                                        30.10                                                    34.45                                                                         45.58                                                                         5.92                                                                       25.97                                                                         9.10                                                       0.74                                                   13.76                                                                         5.18                                                     0.98                                                     2.24                                                      5.60                                                    33.15                                                      18.55                                                     6.86                                                     12.95                                                   31.85                                                                         43.86                                                      1.06                                                    18.13                                                                        8.48                                                                        19.61                         
        1                    4.9               3.0                1.4               0.2  setosa                                  3.2                                   1.6                                    6.3                                   4.4                                   7.9                                   5.1                                  4.20                                 14.70                                  0.98                                 0.60                                   6.86                                  0.28                                              25.28                                                                       15.68                                                     0.32                                                    24.99                                                      7.14                                                      1.58                                                    18.90                                                      7.84                                                     20.16                                                                        49.77                                                                         11.06                                                     34.76                                                                        21.56                                                     10.08                                                                          1.26                                                     13.20                                                     7.04                                                                        23.70                                                    32.13                                                                         40.29                                                                         5.12                                                                       22.44                                                                         8.82                                                       0.64                                                   12.64                                                                         4.48                                                     0.88                                                     2.24                                                      4.80                                                    30.87                                                      15.30                                                     6.16                                                      9.60                                                   27.72                                                                         38.71                                                      1.02                                                    14.08                                                                        8.16                                                                        16.32                         
        2                    4.7               3.2                1.3               0.2  setosa                                  3.4                                   1.5                                    6.0                                   4.5                                   7.9                                   4.9                                  4.16                                 15.04                                  0.94                                 0.64                                   6.11                                  0.26                                              26.86                                                                       15.98                                                     0.30                                                    23.03                                                      6.37                                                      1.58                                                    19.20                                                      7.05                                                     20.40                                                                        47.40                                                                         10.27                                                     35.55                                                                        21.15                                                      9.00                                                                          1.20                                                     14.40                                                     6.75                                                                        25.28                                                    29.40                                                                         38.71                                                                         5.10                                                                       22.05                                                                         7.80                                                       0.68                                                   11.85                                                                         4.42                                                     0.90                                                     1.95                                                      4.80                                                    28.20                                                      15.68                                                     5.85                                                     10.88                                                   27.00                                                                         37.13                                                      0.98                                                    15.30                                                                        7.35                                                                        16.66                         
        3                    4.6               3.1                1.5               0.2  setosa                                  3.3                                   1.7                                    6.1                                   4.6                                   7.7                                   4.8                                  4.65                                 14.26                                  0.92                                 0.62                                   6.90                                  0.30                                              25.41                                                                       15.18                                                     0.34                                                    22.08                                                      7.20                                                      1.54                                                    18.91                                                      7.82                                                     20.13                                                                        46.97                                                                         11.55                                                     35.42                                                                        21.16                                                     10.37                                                                          1.22                                                     14.26                                                     7.82                                                                        23.87                                                    29.28                                                                         36.96                                                                         5.61                                                                       22.08                                                                         9.15                                                       0.66                                                   13.09                                                                         4.95                                                     0.92                                                     2.55                                                      5.27                                                    28.06                                                      14.88                                                     6.90                                                     10.23                                                   28.06                                                                         35.42                                                      0.96                                                    15.18                                                                        8.16                                                                        15.84                         
        4                    5.0               3.6                1.4               0.2  setosa                                  3.8                                   1.6                                    6.4                                   5.0                                   8.6                                   5.2                                  5.04                                 18.00                                  1.00                                 0.72                                   7.00                                  0.28                                              32.68                                                                       19.00                                                     0.32                                                    26.00                                                      7.28                                                      1.72                                                    23.04                                                      8.00                                                     24.32                                                                        55.04                                                                         12.04                                                     43.00                                                                        25.00                                                     10.24                                                                          1.28                                                     18.00                                                     8.00                                                                        30.96                                                    33.28                                                                         44.72                                                                         6.08                                                                       26.00                                                                         8.96                                                       0.76                                                   13.76                                                                         5.32                                                     1.00                                                     2.24                                                      5.76                                                    32.00                                                      18.72                                                     7.00                                                     13.68                                                   32.00                                                                         43.00                                                      1.04                                                    19.00                                                                        8.32                                                                        19.76
    

    For more on primitives, see here and for a list of all the transform primitives see here.

    Featuretools works best for relational datasets with many tables, but it can also work well for single tables. There are several guides working from a single table where an entity is normalized in order to create multiple tables. The taxi trip duration project is a good example of this method.

    With the iris dataset, there are only 4 numerical features - assuming you are predicting the species - and these do not directly make for values on which you can normalize. However, you could apply a clustering technique such as KMeans clustering to the numerical features and then create an entity based on the cluster assignments. The predict remaining useful life project has an example of this technique.

    Keep in mind that most of the examples use the older version of Featuretools. To check the difference, please refer to link.