pythonrreticulate

"ValueError: zero-size array to reduction operation maximum which has no identity" error when calling a Python function from R


I'm trying to use the Fast Density-Based Clustering Validation (DBCV) Python package from R through the reticulate R library, but I'm getting an error I cannot solve. I'm using a Dell computer with Linux Xubuntu 22.04.05 operating system on, with Python 3.11.9 and R 4.3.1.

Here are my steps:

in a shell terminal, I create a Python environment and then install the packages needed:

python3 -m venv dbcv_environment
dbcv_environment/bin/pip3 install scikit-learn numpy
dbcv_environment/bin/pip3 install "git+https://github.com/FelSiq/DBCV"

In R then, I install the needed R packages, call the Python environment created, generate a sample dataset and its labels, and try to apply the dbcv() function:

setwd(".")
options(stringsAsFactors = FALSE)
options(repos = list(CRAN="http://cran.rstudio.com/"))

list.of.packages <- c("pacman")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]

library("pacman")
p_load("reticulate")

data <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), nrow = 5, byrow = TRUE)
labels <- c(0,0,1,1,1)

use_virtualenv("./dbcv_environment")
dbcvLib <- import("dbcv")
dbcvLib$dbcv(X=data, y=labels)

But when I execute the last command, I get the following error:

Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  ValueError: zero-size array to reduction operation maximum which has no identity
Run `reticulate::py_last_error()` for details.

Does anybody know how to solve this problem? Any help with be appreciated, thanks!


Solution

  • The error

    ValueError: zero-size array to reduction operation maximum which has no identity
    

    likely origins from the fact that no clusters can be identified by the algorithm because of the structure of your example feature data (it states that python cannot compute the maximum of an empty array).

    To check that there's nothing wrong with reticulate here, you may run the equivalent python code (you can save the below code in a script and run in a shell using your venv).

    import numpy as np
    from dbcv import dbcv
    
    # data matrix
    data = np.array([[1, 2],  
                     [3, 4],
                     [5, 6],
                     [7, 8],
                     [9, 10]])
    
    # labels array
    labels = np.array([0, 0, 1, 1, 1])
    
    # Calculate DBCV score
    result = dbcv(X=data, y=labels)
    print(result)
    

    Running through reticulate works fine for data clearly containing clusters:

    data <- matrix(c(
      1, 1,    # Cluster 0
      1.2, 1.1,
      0.8, 1.2,
      4, 4,    # Cluster 1
      4.2, 4.1,
      3.8, 4.2
    ), ncol = 2, byrow = TRUE)
    
    labels <- c(0, 0, 0, 1, 1, 1)
    
    (
     result <- dbcvLib$dbcv(X=data, y=labels)
    )
    
    [1] 1