cmakegithub-actionsdocker-imagegithub-packages

GitHub Packages: Inconsistency when running Docker Image Locally and in GitHub Actions


I have been using Github actions to build and publish a docker image to the Github Container Registry according to the Documentation. I am getting an inconsistency behavior when I pull the new image and test it locally.

I have a CMake project in C++ that runs a simple hello world with an INTERFACE and SHARED library.

When I build a docker image locally and test it, this is the output (which is working fine):

*************************************
*** DBSCAN Cluster Segmentation *** 
*************************************
--cloudfile: required.
Usage: program [options] 

Optional arguments:
-h --help           shows help message and exits [default: false]
-v --version        prints version information and exits [default: false]
--cloudfile         input cloud file [required]
--octree-res        octree resolution [default: 120]
--eps               epsilon value [default: 40]
--minPtsAux         minimum auxiliar points [default: 5]
--minPts            minimum points [default: 5]
-o --output-dir     output dir to save clusters [default: "-"]
--ext               cluster output extension [pcd, ply, txt, xyz] [default: "pcd"]
-d --display        display clusters in the pcl visualizer [default: false]
--cal-eps           calculate the value of epsilon with the distance to the nearest n points [default: false]

In Github Actions I am using this workflow:

name: Demo Push

on:
  push:
    # Publish `master` as Docker `latest` image.
    branches: ["test-github-packages"]

    # Publish `v1.2.3` tags as releases.
    tags:
      - v*

  # Run tests for any PRs.
  pull_request:

env:
  IMAGE_NAME: dbscan-octrees

jobs:
  # Push image to GitHub Packages.
  # See also https://docs.docker.com/docker-hub/builds/
  push:
    runs-on: ubuntu-latest
    permissions:
      packages: write
      contents: read

    steps:
      - uses: actions/checkout@v3
        with:
          submodules: recursive

      - name: Build image
        run: docker build --file Dockerfile --tag $IMAGE_NAME --label "runnumber=${GITHUB_RUN_ID}" .

      - name: Test image
        run: |
          docker run --rm \
          --env="DISPLAY" \
          --env="QT_X11_NO_MITSHM=1" \
          --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
          dbscan-octrees:latest

      - name: Log in to registry
        # This is where you will update the PAT to GITHUB_TOKEN
        run: echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u $ --password-stdin

      - name: Push image
        run: |
          IMAGE_ID=ghcr.io/${{ github.repository_owner }}/$IMAGE_NAME

          # Change all uppercase to lowercase
          IMAGE_ID=$(echo $IMAGE_ID | tr '[A-Z]' '[a-z]')
          # Strip git ref prefix from version
          VERSION=$(echo "${{ github.ref }}" | sed -e 's,.*/\(.*\),\1,')
          # Strip "v" prefix from tag name
          [[ "${{ github.ref }}" == "refs/tags/"* ]] && VERSION=$(echo $VERSION | sed -e 's/^v//')
          # Use Docker `latest` tag convention
          [ "$VERSION" == "master" ] && VERSION=latest          
          echo IMAGE_ID=$IMAGE_ID
          echo VERSION=$VERSION
          docker tag $IMAGE_NAME $IMAGE_ID:latest
          docker push $IMAGE_ID:latest

The compilation and test steps are working fine with no errors (check this run). The problem is with the newly generated image after the push to the Github Container registry since when I pulled it locally to test it, the program is crashing with an "Illegal Instruction (core dumped)" error. I have tried to debug to find the problem and there is not a compilation error, link error, or something like that. I found out that this might be related to the linking part of the SHARED library, but it is strange because if the image is working when is built in the Github Action runner, I don't understand why fails the pushed image.

I found this post where the error might be something related to Github that changes the container during the installation.

Hope someone can help me with this.

This is the output in the Test image step on the workflow: workflow

This is the error after pulling the newly generated image and testing it locally: error

I have even compared the bad binary file (Github version in the docker image) with the good version (Compiled version locally) using ghex, and the binary file generated by GitHub after pushing a new image is a little bigger than the good one.

binary comparision binary sizes

Issue

CPU AVX instruction set not supported by local PC

Solution

Enable compilation flags in CMake to disable AVX support


Solution

  • Description

    After digging using analysis tools for binaries files, debugging, etc. I discovered that the problem was related to the AVX CPU support in the GitHub action runner. My Computer does not support AVX optimized instructions, so I have to enable a compilation flag for my shared libraries in order to disable AVX support. This compilation flag will tell the Github Action runner to compile the project with no AVX CPU support or CPU optimizations which is the standard environment in GitHub Actions.

    Analysis tools:

    log error

    Using the strace tool I got the next error:

    --- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x55dcd7324bc0} ---
    +++ killed by SIGILL (core dumped) +++
    Illegal instruction (core dumped)
    

    This error allowed me to find the error code and after searching on the internet I found a solution to my specific problem since my project was using Point cloud Library (PCL), I compiled my project with -mno-avx, according to this post.

    Solution

    In the CMakeList.txt file for each SHARED library define the next compilation flag:

    target_compile_options(${PROJECT_NAME} PUBLIC -mno-avx)
    

    New issue

    I have resolved the major issue, but now one of my shared libraries has the same error. I will try to fix it with one of these (I think) flags.

    After making a lot of tests and using CPU-X software and detecting the proper architecture-specific options in my PC with the following command via GCC:

        gcc -march=native -E -v - </dev/null 2>&1 | grep cc1
    
        
        output:
    
        /usr/lib/gcc/x86_64-linux-gnu/9/cc1 -E -quiet -v -imultiarch 
        x86_64-linux-gnu - -march=haswell -mmmx -mno-3dnow -msse -msse2 
        -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -mno-aes -mno-sha 
        -mpclmul -mpopcnt -mabm -mno-lwp -mno-fma -mno-fma4 -mno-xop 
        -mno-bmi -mno-sgx -mno-bmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm 
        -mno-avx -mno-avx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle 
        -mrdrnd -mno-f16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx 
        -mfxsr -mno-xsave -mno-xsaveopt -mno-avx512f -mno-avx512er 
        -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt 
        -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl 
        -mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps 
        -mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku 
        -mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni 
        -mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-avx512vpopcntdq 
        -mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote 
        -mno-ptwrite --param l1-cache-size=32 
        --param l1-cache-line-size=64 --param l2-cache-size=3072 
        -mtune=haswell -fasynchronous-unwind-tables 
        -fstack-protector-strong -Wformat -Wformat-security 
        -fstack-clash-protection -fcf-protection
    

    Final solution

    I have fixed the execution error with the following flags in my SHARED library:

    # MMX, SSE(1, 2, 3, 3S, 4.1, 4.2), CLMUL, RdRand, VT-x, x86-64
    target_compile_options(${PROJECT_NAME} PRIVATE -Wno-cpp
        -mmmx
        -msse
        -msse2
        -msse3
        -mssse3
        -msse4.2
        -msse4.1
        -mno-sse4a
        -mno-avx
        -mno-avx2
        -mno-fma
        -mno-fma4
        -mno-f16c
        -mno-xop
        -mno-bmi
        -mno-bmi2
        -mrdrnd
        -mno-3dnow
        -mlzcnt
        -mfsgsbase
        -mpclmul
    )
    

    Now, the docker image stored in the GitHub Container Registry is working as expected on my local PC.

    Related posts