Julia - AssertionError in K-medoids algorithm

I am using K-medoids to reduce my data (potential future scenarios of a stock's price) from 400 scenarios to n = 25. I computed the distance amtrix based on the Eucledian distance:

C = pairwise(Euclidean(), data', data')

where data are my 400 scenarios - the data file can be found at link. Then I run K-medoids:

kmedoids(C, n, init = :kmpp)

I get the error:

ERROR: AssertionError: !(isempty(grp))

Does any of you know what could be wrong? It seems to run fine in Python (from sklearn_extra.cluster import KMedoids), but I guess the two K-medoids functions are different. I haven't been able to find any documentation on why it fails..

Solution

(Example)

(In general, please provide minimal complete workable example code demonstrating the problem so readers can quickly reproduce it by importing the same packages and reading into the same data object. I guessed the following.)

import Clustering.kmedoids
import Distances.Euclidean
import Distances.pairwise
import DelimitedFiles.readdlm

data = readdlm("data.csv")

(Now your code reproduces the error.)

n = 25
C = pairwise(Euclidean(), data', data')
kmedoids(C, n, init = :kmpp)

ERROR: AssertionError: !(isempty(grp))
Stacktrace:
 [1] _find_medoid(dist::Matrix{Float64}, grp::Vector{Int64})
   @ Clustering C:\Users\...\.julia\packages\Clustering\JwhfU\src\kmedoids.jl:229
 [2] _kmedoids!(medoids::Vector{Int64}, dist::Matrix{Float64}, maxiter::Int64, tol::Float64, displevel::Int64)
   @ Clustering C:\Users\...\.julia\packages\Clustering\JwhfU\src\kmedoids.jl:148
 [3] kmedoids(dist::Matrix{Float64}, k::Int64; init::Symbol, maxiter::Int64, tol::Float64, display::Symbol)
   @ Clustering C:\Users\...\.julia\packages\Clustering\JwhfU\src\kmedoids.jl:77
 [4] top-level scope
   @ REPL[13]:1

Answer

Maybe not enough distinct values in the data to create 25 or more different non-empty clusters.

Set(data)

produces

Set{Float64} with 24 elements:
  -26.917
  12.619999999999997
  12.811
  12.79
  -4.722000000000001
  -27.619000000000003
  12.771
  11.009999999999998
  12.800999999999995
  -27.649
  9.980999999999995
  12.780999999999999
  10.780000000000001
  12.820999999999998
  12.790999999999997
  -27.001
  -27.586000000000002
  10.210999999999999
  7.9809999999999945
  11.989999999999995
  -4.579000000000001
  7.9709999999999965
  9.190999999999995
  -11.868000000000002

The ?kmediods REPL help documentation ends with a note on the algorithm.

The function implements a K-means style algorithm instead of PAM (Partitioning Around Medoids). K-means style algorithm converges in fewer iterations, but was shown to produce worse (10-20% higher total costs) results (see e.g. Schubert & Rousseeuw (2019)).

The Clustering.kmedoids documentation references

Teitz, M.B. and Bart, P. (1968). Heuristic Methods for Estimating the Generalized Vertex Median of a Weighted Graph. Operations Research, 16(5), 955–961. doi:10.1287/opre.16.5.955

Schubert, E. and Rousseeuw, P.J. (2019). Faster k-medoids clustering: Improving the PAM, CLARA, and CLARANS Algorithms. SISAP, 171-187. doi:10.1007/978-3-030-32047-8_16