I am using K-medoids to reduce my data (potential future scenarios of a stock's price) from 400 scenarios to n = 25. I computed the distance amtrix based on the Eucledian distance:
C = pairwise(Euclidean(), data', data')
where data
are my 400 scenarios - the data file can be found at link. Then I run K-medoids:
kmedoids(C, n, init = :kmpp)
I get the error:
ERROR: AssertionError: !(isempty(grp))
Does any of you know what could be wrong? It seems to run fine in Python (from sklearn_extra.cluster import KMedoids), but I guess the two K-medoids functions are different. I haven't been able to find any documentation on why it fails..
(In general, please provide minimal complete workable example code demonstrating the problem so readers can quickly reproduce it by importing the same packages and reading into the same data object. I guessed the following.)
import Clustering.kmedoids
import Distances.Euclidean
import Distances.pairwise
import DelimitedFiles.readdlm
data = readdlm("data.csv")
(Now your code reproduces the error.)
n = 25
C = pairwise(Euclidean(), data', data')
kmedoids(C, n, init = :kmpp)
ERROR: AssertionError: !(isempty(grp))
Stacktrace:
[1] _find_medoid(dist::Matrix{Float64}, grp::Vector{Int64})
@ Clustering C:\Users\...\.julia\packages\Clustering\JwhfU\src\kmedoids.jl:229
[2] _kmedoids!(medoids::Vector{Int64}, dist::Matrix{Float64}, maxiter::Int64, tol::Float64, displevel::Int64)
@ Clustering C:\Users\...\.julia\packages\Clustering\JwhfU\src\kmedoids.jl:148
[3] kmedoids(dist::Matrix{Float64}, k::Int64; init::Symbol, maxiter::Int64, tol::Float64, display::Symbol)
@ Clustering C:\Users\...\.julia\packages\Clustering\JwhfU\src\kmedoids.jl:77
[4] top-level scope
@ REPL[13]:1
Maybe not enough distinct values in the data to create 25 or more different non-empty clusters.
Set(data)
produces
Set{Float64} with 24 elements:
-26.917
12.619999999999997
12.811
12.79
-4.722000000000001
-27.619000000000003
12.771
11.009999999999998
12.800999999999995
-27.649
9.980999999999995
12.780999999999999
10.780000000000001
12.820999999999998
12.790999999999997
-27.001
-27.586000000000002
10.210999999999999
7.9809999999999945
11.989999999999995
-4.579000000000001
7.9709999999999965
9.190999999999995
-11.868000000000002
The ?kmediods
REPL help documentation ends with a note on the algorithm.
The function implements a K-means style algorithm instead of PAM (Partitioning Around Medoids). K-means style algorithm converges in fewer iterations, but was shown to produce worse (10-20% higher total costs) results (see e.g. Schubert & Rousseeuw (2019)).
The Clustering.kmedoids
documentation references
Teitz, M.B. and Bart, P. (1968). Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph. Operations Research, 16(5), 955–961. doi:10.1287/opre.16.5.955
Schubert, E. and Rousseeuw, P.J. (2019). Faster k-medoids clustering: Improving the PAM, CLARA, and CLARANS Algorithms. SISAP, 171-187. doi:10.1007/978-3-030-32047-8_16