I have been using Github actions to build and publish a docker image to the Github Container Registry according to the Documentation. I am getting an inconsistency behavior when I pull the new image and test it locally.
I have a CMake project in C++ that runs a simple hello world with an INTERFACE and SHARED library.
When I build a docker image locally and test it, this is the output (which is working fine):
*************************************
*** DBSCAN Cluster Segmentation ***
*************************************
--cloudfile: required.
Usage: program [options]
Optional arguments:
-h --help shows help message and exits [default: false]
-v --version prints version information and exits [default: false]
--cloudfile input cloud file [required]
--octree-res octree resolution [default: 120]
--eps epsilon value [default: 40]
--minPtsAux minimum auxiliar points [default: 5]
--minPts minimum points [default: 5]
-o --output-dir output dir to save clusters [default: "-"]
--ext cluster output extension [pcd, ply, txt, xyz] [default: "pcd"]
-d --display display clusters in the pcl visualizer [default: false]
--cal-eps calculate the value of epsilon with the distance to the nearest n points [default: false]
In Github Actions I am using this workflow:
name: Demo Push
on:
push:
# Publish `master` as Docker `latest` image.
branches: ["test-github-packages"]
# Publish `v1.2.3` tags as releases.
tags:
- v*
# Run tests for any PRs.
pull_request:
env:
IMAGE_NAME: dbscan-octrees
jobs:
# Push image to GitHub Packages.
# See also https://docs.docker.com/docker-hub/builds/
push:
runs-on: ubuntu-latest
permissions:
packages: write
contents: read
steps:
- uses: actions/checkout@v3
with:
submodules: recursive
- name: Build image
run: docker build --file Dockerfile --tag $IMAGE_NAME --label "runnumber=${GITHUB_RUN_ID}" .
- name: Test image
run: |
docker run --rm \
--env="DISPLAY" \
--env="QT_X11_NO_MITSHM=1" \
--volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
dbscan-octrees:latest
- name: Log in to registry
# This is where you will update the PAT to GITHUB_TOKEN
run: echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u $ --password-stdin
- name: Push image
run: |
IMAGE_ID=ghcr.io/${{ github.repository_owner }}/$IMAGE_NAME
# Change all uppercase to lowercase
IMAGE_ID=$(echo $IMAGE_ID | tr '[A-Z]' '[a-z]')
# Strip git ref prefix from version
VERSION=$(echo "${{ github.ref }}" | sed -e 's,.*/\(.*\),\1,')
# Strip "v" prefix from tag name
[[ "${{ github.ref }}" == "refs/tags/"* ]] && VERSION=$(echo $VERSION | sed -e 's/^v//')
# Use Docker `latest` tag convention
[ "$VERSION" == "master" ] && VERSION=latest
echo IMAGE_ID=$IMAGE_ID
echo VERSION=$VERSION
docker tag $IMAGE_NAME $IMAGE_ID:latest
docker push $IMAGE_ID:latest
The compilation and test steps are working fine with no errors (check this run). The problem is with the newly generated image after the push to the Github Container registry since when I pulled it locally to test it, the program is crashing with an "Illegal Instruction (core dumped)
" error. I have tried to debug to find the problem and there is not a compilation error, link error, or something like that. I found out that this might be related to the linking part of the SHARED library, but it is strange because if the image is working when is built in the Github Action runner, I don't understand why fails the pushed image.
I found this post where the error might be something related to Github that changes the container during the installation.
Hope someone can help me with this.
This is the output in the Test image step on the workflow: workflow
This is the error after pulling the newly generated image and testing it locally: error
I have even compared the bad binary file (Github version in the docker image) with the good version (Compiled version locally) using ghex, and the binary file generated by GitHub after pushing a new image is a little bigger than the good one.
binary comparision binary sizes
CPU AVX instruction set not supported by local PC
Enable compilation flags in CMake to disable AVX support
After digging using analysis tools for binaries files, debugging, etc. I discovered that the problem was related to the AVX CPU support in the GitHub action runner. My Computer does not support AVX optimized instructions, so I have to enable a compilation flag for my shared libraries in order to disable AVX support. This compilation flag will tell the Github Action runner to compile the project with no AVX CPU support or CPU optimizations which is the standard environment in GitHub Actions.
Analysis tools:
ldd binary
strace binary
<-- this one allows me to identify the SIGEV_SIGNAL error codeUsing the strace
tool I got the next error:
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x55dcd7324bc0} ---
+++ killed by SIGILL (core dumped) +++
Illegal instruction (core dumped)
This error allowed me to find the error code and after searching on the internet I found a solution to my specific problem since my project was using Point cloud Library (PCL), I compiled my project with -mno-avx
, according to this post.
In the CMakeList.txt
file for each SHARED library define the next compilation flag:
target_compile_options(${PROJECT_NAME} PUBLIC -mno-avx)
I have resolved the major issue, but now one of my shared libraries has the same error. I will try to fix it with one of these (I think) flags.
After making a lot of tests and using CPU-X software and detecting the proper architecture-specific options in my PC with the following command via GCC:
gcc -march=native -E -v - </dev/null 2>&1 | grep cc1
output:
/usr/lib/gcc/x86_64-linux-gnu/9/cc1 -E -quiet -v -imultiarch
x86_64-linux-gnu - -march=haswell -mmmx -mno-3dnow -msse -msse2
-msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -mno-aes -mno-sha
-mpclmul -mpopcnt -mabm -mno-lwp -mno-fma -mno-fma4 -mno-xop
-mno-bmi -mno-sgx -mno-bmi2 -mno-pconfig -mno-wbnoinvd -mno-tbm
-mno-avx -mno-avx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle
-mrdrnd -mno-f16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx
-mfxsr -mno-xsave -mno-xsaveopt -mno-avx512f -mno-avx512er
-mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt
-mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl
-mno-avx512ifma -mno-avx512vbmi -mno-avx5124fmaps
-mno-avx5124vnniw -mno-clwb -mno-mwaitx -mno-clzero -mno-pku
-mno-rdpid -mno-gfni -mno-shstk -mno-avx512vbmi2 -mno-avx512vnni
-mno-vaes -mno-vpclmulqdq -mno-avx512bitalg -mno-avx512vpopcntdq
-mno-movdiri -mno-movdir64b -mno-waitpkg -mno-cldemote
-mno-ptwrite --param l1-cache-size=32
--param l1-cache-line-size=64 --param l2-cache-size=3072
-mtune=haswell -fasynchronous-unwind-tables
-fstack-protector-strong -Wformat -Wformat-security
-fstack-clash-protection -fcf-protection
I have fixed the execution error with the following flags in my SHARED library:
# MMX, SSE(1, 2, 3, 3S, 4.1, 4.2), CLMUL, RdRand, VT-x, x86-64
target_compile_options(${PROJECT_NAME} PRIVATE -Wno-cpp
-mmmx
-msse
-msse2
-msse3
-mssse3
-msse4.2
-msse4.1
-mno-sse4a
-mno-avx
-mno-avx2
-mno-fma
-mno-fma4
-mno-f16c
-mno-xop
-mno-bmi
-mno-bmi2
-mrdrnd
-mno-3dnow
-mlzcnt
-mfsgsbase
-mpclmul
)
Now, the docker image stored in the GitHub Container Registry is working as expected on my local PC.