pythondockerdockerfilegreat-expectationspython-docker

How does one run Great Expectations from Docker using a Dockerfile to build the image


I am pretty new to Great Expectations (GX) and very new to Docker, and now I am trying to combine the two. I can get a Docker image to build just fine, but when I try to run a container, it fails. I can get my GX Checkpoint to run from both the GX CLI, as well as from a Python file.

I have tried to run a docker image using both a Python base image (and running the Python file from the image), as well as a GX base image.

Something specific to the GX documentation that I think is important, I will highlight below:

You need to mount the local great_expectations directory into the container at /usr/app/great_expectations, and from there you can run all non-interactive commands, such as running checkpoints and listing items.

I will break up the two paths below:

Python Base Image

The Python Image version of my Dockerfile is basically:

FROM python:3.8-slim
COPY . ./src
RUN pip install -r ./src/requirements.txt
CMD ["python3", "./src/validate_data.py"]

(where my Python file that works outside of Docker is validate_data.py)

When I run this container, I get the following error:

Error: No great_expectations directory was found here!
    - Please check that you are in the correct directory or have specified the correct directory.
    - If you have never run Great Expectations in this project, please run `great_expectations init` to get started.

GX Base Image

The GX Image version of my Dockerfile (which is contained in my great_expectations/ folder is similar to:

FROM greatexpectations/great_expectations:python-3.7-buster-ge-0.12.0
ADD . /usr/app/great_expectations
COPY . ./src
CMD ["checkpoint", "run", "data_checkpoint"]

(where my Checkpoint that works from the CLI outside of Docker is data_checkpoint)

Note: Prior to adding ADD . /usr/app/great_expectations to the Dockerfile, I was getting an identical error to the Python path.

I get the following error:

{'include_rendered_content': ['Unknown field.'], 'checkpoint_store_name': ['Unknown field.']}
Encountered errors during loading data context config. See ValidationError for more details.

Things I have tried:

Python Base Image

All the things I have tried:

No matter what I have tried, I get the same error.

GX Base Image

I found include_rendered_content and checkpoint_store_name in my great_expectations.yml config file. I commented out those lines because I was unsure of their utility, and I got a new error:

You appear to have an invalid config version (3.0). The maximum valid version is 2.

So, I am guessing the reason I am getting these new errors is because the GX base image was built off of v2 of Great Expectations, and I have been using v3 when building out the GX testing infrastructure on my local.

So, that is really leading me to want to make the Python base image path described above work, but that's the one I have made less progress on solving.


Solution

  • I am not sure if this is a legit solution or just a hack, but I was able to get round my problem by changing COPY . ./src in my Dockerfile to COPY . ./great_expectations, so my Dockerfile (which exists inside my great_expectations/ directory) now looks similar to this:

    FROM python:3.8-slim
    COPY . ./great_expectations
    RUN pip install -r ./great_expectations/requirements.txt
    CMD ["python3", "./great_expectations/validate_data.py"]