pythonnumpyaws-lambdapyarrow

Lambda container - Pyarrow and numpy


I have difficulties from this: (aws-lambda-python-alpha): Failed to install numpy 2.3.0 with Python 3.11 or lower

My Dockerfile:

FROM public.ecr.aws/lambda/python:3.11

# Install
RUN pip install 'numpy<2.3.0'
RUN pip install 'pyarrow[s3]'

The pyarrow package still fails on

Collecting numpy>=1.25
  Downloading numpy-2.3.2.tar.gz (20.5 MB)
  ...

I wanted to force pyarrow to use numpy==2.2.1 but I don't see how from here. Do I need to lower the version of pyarrow?


Solution

  • If you read your build log, you can see that it is attempting to build pyarrow from source, and failing because it has no C compiler installed.

    But why is trying to build from source, rather than installing a pre-built wheel? The Lambda image you are using uses glibc 2.26. In order to install a pre-built wheel for pyarrow 21.0.0, you'd need glibc >= 2.28.

    This gives you a few ways you could solve this.

    Solution #1: Use older pyarrow

    This works, because this version of pyarrow includes a build for very old glibc.

    FROM public.ecr.aws/lambda/python:3.11
    
    # Install
    RUN pip install 'numpy<2.3.0'
    RUN pip install 'pyarrow[s3]==20.0.0'
    

    Solution #2: Use newer base image

    The Python 3.12 image uses glibc 2.34, so it is compatible with recent versions of pyarrow.

    FROM public.ecr.aws/lambda/python:3.12
    
    # Install
    RUN pip install 'numpy<2.3.0'
    RUN pip install 'pyarrow[s3]==21.0.0'
    

    Solution #3: Build pyarrow from source

    Both of the previous solutions require changing the version of Python or PyArrow. What if that's a nonstarter?

    In theory, you could build a wheel file compatible with your version of glibc by building pyarrow from source using an image based off of glibc 2.26 or older. Then, you could copy that wheel into your lambda image, and install it. A guide on building pyarrow can be found here.