pythonscikit-learntime-seriesclassificationknn

How to use Dynamic Time warping with kNN in python


I have a time-series dataset with two lables (0 and 1). I am using Dynamic Time Warping (DTW) as a similarity measure for classification using k-nearest neighbour (kNN) as described in these two wonderful blog posts:

However, for classification with kNN the two posts use their own kNN algorithms.

I want to use sklearn's options such as gridsearchcv in my classification. Therefore, I would like to know how I can use Dynamic Time Warping (DTW) with sklearn kNN.

Note: I am not limited to sklearn and happy to receive answers in other libraries as well

I am happy to provide more details if needed.


Solution

  • You can use a custom metric for KNN. Therefore you only need to implement DTW yourself (or use/adapt any existing DTW implementation in python) [gist of this code].

    import numpy as np
    from scipy.spatial import distance
    from sklearn.model_selection import train_test_split
    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.model_selection import GridSearchCV
    from sklearn.metrics import classification_report
    
    #toy dataset 
    X = np.random.random((100,10))
    y = np.random.randint(0,2, (100))
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
    
    #custom metric
    def DTW(a, b):   
        an = a.size
        bn = b.size
        pointwise_distance = distance.cdist(a.reshape(-1,1),b.reshape(-1,1))
        cumdist = np.matrix(np.ones((an+1,bn+1)) * np.inf)
        cumdist[0,0] = 0
    
        for ai in range(an):
            for bi in range(bn):
                minimum_cost = np.min([cumdist[ai, bi+1],
                                       cumdist[ai+1, bi],
                                       cumdist[ai, bi]])
                cumdist[ai+1, bi+1] = pointwise_distance[ai,bi] + minimum_cost
    
        return cumdist[an, bn]
    
    #train
    parameters = {'n_neighbors':[2, 4, 8]}
    clf = GridSearchCV(KNeighborsClassifier(metric=DTW), parameters, cv=3, verbose=1)
    clf.fit(X_train, y_train)
    
    
    
    #evaluate
    y_pred = clf.predict(X_test)
    print(classification_report(y_test, y_pred))
    

    Which yields

    Fitting 3 folds for each of 3 candidates, totalling 9 fits        
    
    [Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:   29.0s finished
    
                             precision    recall  f1-score   support
    
                          0       0.57      0.89      0.70        18
                          1       0.60      0.20      0.30        15
    
                avg / total       0.58      0.58      0.52        33