scikit-learngpuk-meansmini-batch

Drop in scikit-learn KMeans replacement for GPU


I was wondering what is a good drop in replacement for from sklearn.cluster import KMeans. Others online have suggested using cuML from Nvidia's RAPIDS package, but this wasn't able to compile or install for python 3.8 with CUDA version 12.2. Other replacements tend to not have the same parameters as the base one, making it hard to replace. At the moment, it's using MiniBatchKmeans from sklearn.cluster, but this makes use of multiprocessing and always taking up 100% CPU util, making it hard for others using the server to get their code to execute.

I tried installing kmeans-gpu from PyPi, but input is expected to be in 3 channels. Also tried using cuML's clustering KMeans, but versioning was not available.


Solution

  • cuML is a great option for executing KMeans on GPU. However, you might need to update your Python version from what you listed in the question to make it work.

    The current version of cuML (23.08 as of this writing) doesn't support Python 3.8, only Python 3.9 or 3.10. However, you might be able to try cuML 23.04 which does support Python 3.8.

    If you want to use the latest RAPIDS release with CUDA 12 support, try this:

    conda create --solver=libmamba -n rapids-23.08 -c rapidsai -c conda-forge -c nvidia  \
        rapids=23.08 python=3.10 cuda-version=12.0
    

    Note that this requires Python 3.9 or 3.10. As of this writing, only cuda-version=12.0 is supported, and only on x86-64 systems. However, systems with any CUDA 12 version (like 12.2) will support cuda-version=12.0 packages. See https://docs.rapids.ai/install for additional information about using the latest RAPIDS release.

    If you are limited to Python 3.8 and cannot upgrade, then you might need to use a conda environment or Docker container with CUDA 11 since CUDA 12 is not supported in cuml 23.04. Try this:

    conda create -n rapids-23.04 -c rapidsai -c conda-forge -c nvidia cuml=23.04 python=3.8 cuda-version=11.8
    

    Feel free to open an issue on https://github.com/rapidsai/cuml and tag me (bdice) if you'd like further installation assistance.