I'm trying to use the Fast Density-Based Clustering Validation (DBCV) Python package from R through the reticulate R library, but I'm getting an error I cannot solve. I'm using a Dell computer with Linux Xubuntu 22.04.05 operating system on, with Python 3.11.9 and R 4.3.1.
Here are my steps:
in a shell terminal, I create a Python environment and then install the packages needed:
python3 -m venv dbcv_environment
dbcv_environment/bin/pip3 install scikit-learn numpy
dbcv_environment/bin/pip3 install "git+https://github.com/FelSiq/DBCV"
In R then, I install the needed R packages, call the Python environment created, generate a sample dataset and its labels, and try to apply the dbcv()
function:
setwd(".")
options(stringsAsFactors = FALSE)
options(repos = list(CRAN="http://cran.rstudio.com/"))
list.of.packages <- c("pacman")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
library("pacman")
p_load("reticulate")
data <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), nrow = 5, byrow = TRUE)
labels <- c(0,0,1,1,1)
use_virtualenv("./dbcv_environment")
dbcvLib <- import("dbcv")
dbcvLib$dbcv(X=data, y=labels)
But when I execute the last command, I get the following error:
Error in py_call_impl(callable, call_args$unnamed, call_args$named) :
ValueError: zero-size array to reduction operation maximum which has no identity
Run `reticulate::py_last_error()` for details.
Does anybody know how to solve this problem? Any help with be appreciated, thanks!
The error
ValueError: zero-size array to reduction operation maximum which has no identity
likely origins from the fact that no clusters can be identified by the algorithm because of the structure of your example feature data
(it states that python cannot compute the maximum of an empty array).
To check that there's nothing wrong with reticulate here, you may run the equivalent python code (you can save the below code in a script and run in a shell using your venv).
import numpy as np
from dbcv import dbcv
# data matrix
data = np.array([[1, 2],
[3, 4],
[5, 6],
[7, 8],
[9, 10]])
# labels array
labels = np.array([0, 0, 1, 1, 1])
# Calculate DBCV score
result = dbcv(X=data, y=labels)
print(result)
Running through reticulate works fine for data
clearly containing clusters:
data <- matrix(c(
1, 1, # Cluster 0
1.2, 1.1,
0.8, 1.2,
4, 4, # Cluster 1
4.2, 4.1,
3.8, 4.2
), ncol = 2, byrow = TRUE)
labels <- c(0, 0, 0, 1, 1, 1)
(
result <- dbcvLib$dbcv(X=data, y=labels)
)
[1] 1