I have cudf
, and numba
installed. My *.py
file itself does not rely on numba
. Before I installed cudf
related packages, my code worked fine. After I have cudf
related packages installed, python3 -m cudf.pandas my_py_101.py
leads to the following error:
[Actual outcome]
/usr/local/lib/python3.10/dist-packages/cudf/utils/_ptxcompiler.py:61: UserWarning: Error getting driver and runtime versions:
stdout:
stderr:
Traceback (most recent call last):
File "<string>", line 7, in <module>
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 111, in get_version
self.cudaRuntimeGetVersion(ctypes.byref(rtver))
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 65, in __getattr__
self._initialize()
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 51, in _initialize
self.lib = open_cudalib('cudart')
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 63, in open_cudalib
path = get_cudalib(lib)
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 55, in get_cudalib
libdir = get_cuda_paths()[dir_type].info
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 223, in get_cuda_paths
'nvvm': _get_nvvm_path(),
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 201, in _get_nvvm_path
candidates = find_lib('nvvm', path)
File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 44, in find_lib
return find_file(regex, libdir)
File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 56, in find_file
entries = os.listdir(ldir)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/nvvm/lib64'
Not patching Numba
warnings.warn(msg, UserWarning)
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 187, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/usr/lib/python3.10/runpy.py", line 110, in _get_module_details
__import__(pkg_name)
File "/usr/local/lib/python3.10/dist-packages/cudf/__init__.py", line 10, in <module>
validate_setup()
File "/usr/local/lib/python3.10/dist-packages/cudf/utils/gpu_utils.py", line 95, in validate_setup
cuda_runtime_version = runtimeGetVersion()
File "/usr/local/lib/python3.10/dist-packages/rmm/_cuda/gpu.py", line 88, in runtimeGetVersion
major, minor = numba.cuda.runtime.get_version()
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 111, in get_version
self.cudaRuntimeGetVersion(ctypes.byref(rtver))
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 65, in __getattr__
self._initialize()
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/runtime.py", line 51, in _initialize
self.lib = open_cudalib('cudart')
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 63, in open_cudalib
path = get_cudalib(lib)
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cudadrv/libs.py", line 55, in get_cudalib
libdir = get_cuda_paths()[dir_type].info
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 223, in get_cuda_paths
'nvvm': _get_nvvm_path(),
File "/usr/local/lib/python3.10/dist-packages/numba/cuda/cuda_paths.py", line 201, in _get_nvvm_path
candidates = find_lib('nvvm', path)
File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 44, in find_lib
return find_file(regex, libdir)
File "/usr/local/lib/python3.10/dist-packages/numba/misc/findlib.py", line 56, in find_file
entries = os.listdir(ldir)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/nvvm/lib64'
[What I did]
My docker environment Dockerfile
is built as follow:
FROM ubuntu:22.04
FROM nvidia/cuda:12.0.1-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y wget && apt-get install curl -y && apt-get install unzip && apt-get install python3-pip -y
ENV PATH=$PATH:~/.local/bin:~/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
RUN pip install --extra-index-url=https://pypi.nvidia.com cudf-cu12==23.12.* dask-cudf-cu12==23.12.* cuml-cu12==23.12.* cugraph-cu12==23.12.*
RUN pip install numpy==1.24.3 pandas==1.5.3 Cython==3.0.6 scikit-learn==1.3.2 swifter==1.3.4 requests==2.28.2 numba==0.57.1 scikit-learn-intelex==2024.0.1
RUN pip install torch torchvision torchaudio
numba
package. I checked the depency page and find cudf
relies on numba>=0.57,numba<0.58
where I have numba==0.57.1
. Note that I don't have any numba
related code in my script.cudf
requires cuda 12.0
while I'm using cuda 12.0.1
, which is the closest version.The yaml
file to start the docker is this:
apiVersion: batch/v1
kind: Job
metadata:
name: test-cuda
namespace: tom # job and pvc should be in the same namespace
spec:
template:
metadata:
labels:
app: test-cuda
spec:
containers:
- name: test-cuda
image: <my_url>/tom/valid:cudf
command: ["bash", "-c", "tail /proc/cpuinfo -n 28 &>> job.log; python3 -m cudf.pandas my_py_101.py &>> job.log; echo 'test my_py & GPU' &>> job.log; mkdir result_my_py_20231229 ; mv job.log result_my_py_20231229/ ; tar -cjf result_my_py_20231229.bz2 result_my_py_20231229/ ; ls *.bz2; pwd ; aws s3 cp --endpoint http://<my_url> /result_my_py_20231229.bz2 s3://mybucket01/"]
resources:
requests:
cpu: 9
memory: 128Gi
limits:
cpu: 12
memory: 256Gi
imagePullPolicy: IfNotPresent #Always
restartPolicy: Never
How can I fix it?
I have experienced this issue before as a cuDF developer. I think you can fix this by changing one line in your Dockerfile. Try making your Docker image from the "devel" flavor of the CUDA containers:
FROM nvidia/cuda:12.0.1-devel-ubuntu22.04
When you import cudf
, it imports numba
as a dependency. However, numba
fails at import time because it only finds part of its CUDA Toolkit requirements. The runtime
CUDA images are fairly minimal and don't have some of the NVVM pieces that Numba needs.
Background: The cuDF library supports user-defined functions (UDFs) for features like df.apply
. To execute user-defined Python code on the GPU, cuDF calls Numba to perform just-in-time (JIT) CUDA compilation. Numba requires some pieces of the CUDA Toolkit to do this, including NVVM. The CUDA Toolkit that comes with the nvidia/cuda
"runtime" image does not include all the pieces that are needed, because NVVM and related tools that Numba needs are considered to be compilers. The goal of the "runtime" images is to have a minimal size that can run pre-built CUDA code, so compilers are excluded. The "devel" flavor does contain NVVM, and all other components needed to build CUDA code (which includes Numba's JIT functionality).