I have 6 time series values as follows.
import numpy as np
series = np.array([
[0., 0, 1, 2, 1, 0, 1, 0, 0],
[0., 1, 2, 0, 0, 0, 0, 0, 0],
[1., 2, 0, 0, 0, 0, 0, 1, 1],
[0., 0, 1, 2, 1, 0, 1, 0, 0],
[0., 1, 2, 0, 0, 0, 0, 0, 0],
[1., 2, 0, 0, 0, 0, 0, 1, 1]])
Suppose, I want to get the distance matrix of dynamic time warping to perform a clustering. I used dtaidistance
library for that as follows.
from dtaidistance import dtw
ds = dtw.distance_matrix_fast(series)
The output I got was as follows.
array([[ inf, 1.41421356, 2.23606798, 0. , 1.41421356, 2.23606798],
[ inf, inf, 1.73205081, 1.41421356, 0. , 1.73205081],
[ inf, inf, inf, 2.23606798, 1.73205081, 0. ],
[ inf, inf, inf, inf, 1.41421356, 2.23606798],
[ inf, inf, inf, inf, inf, 1.73205081],
[ inf, inf, inf, inf, inf, inf]])
It seems to me that the output I get is wrong. For instance, as I understand the diagonal values of the ouput should be 0
(since they are ideal matches).
I want to know where I am making things wrong and how to fix it. I am also happy to get answers using other python libraries too.
I am happy to provide more details if needed.
Everything is correct. As per the docs:
The result is stored in a matrix representation. Since only the upper triangular matrix is required this representation uses more memory then necessary.
All diagonal elements are 0 the the lower triangular matrix is the the same as the upper triagular matrix mirrored at the diagonal. As all these value can be deducted from the upper triangular matrix they aren't shown in the output.
You can even use the compact=True
argument to only get the values from the upper diagonal matrix concatenated into a 1D array.
You can convert the result to a full matrix like this:
ds[ds==np.inf] = 0
ds += dt.T