I am building an image inside kubernetes in a container using kaniko. When running the build job I run into an issue where the build job gets OOMKilled when fetching the source context from a remote git repository. I am using the latest version of the kaniko executor image (gcr.io/kaniko-project/executor:latest) and my worker node has 8GB of RAM.
The Dockerfile for my image is located in a remote git repository and I am using the following build arguments:
I've used the following build arguments:
f"--dockerfile=/images/Containerfile",
"--context=git://gitRepo.git#refs/heads/main",
f"--cache={False}",
"--verbosity=debug",
f"--cache-copy-layers={False}",
f"--compressed-caching={False}",
"--use-new-run",
f"--destination=mydestination"
#f" bunch of build args"
When running the build job, I see the following logs:
DEBU[0000] Getting source context from git://repo.git#refs/heads/main
DEBU[0000] Getting source from reference
Enumerating objects: 944, done.
Counting objects: 100% (879/879), done.
Compressing objects: 100% (464/464), done.
The build job exits with an OOMKilled error at the point where kaniko is fetching the source context from the remote git repository. I was able to build normally not so long ago. This error started after I added a large 2Gi SQL file in the same repo/source context. I still have this error even after removing the large file. I get the error for all version's of kaniko now.
I feel like the error is related to caching and I've tried setting compressed_caching to False as suggested by various issues 2491,1333. I don't have an issue accessing the repo as all permissions work, the issue is while downloading the context. A point to note is that when using a 16Gi node to run this container it works 50% of the time. An I checked the usage when it worked, only initially does it use close to 12 to 15 Gi memory and rest of the actual build (till finishing the build) it uses 2Gi memory.
Any suggestions on how to resolve this issue would be greatly appreciated.
Short Version:
I ended up using a different git repo source context with less than 100MB size instead of the original git context with more than 2 Gi size.
Longer Version:
The issue started right after adding the large SQL files to the original git source context. Kaniko was acting up using 12+ Gi memory. Using a 16Gi memory instance in the k8s cluster worked 50% of the time. Naturally, I removed the larger files from the source context knowing that would fix it.
But even after removing the large files from the repository/source_context this problem was not resolved. This led me to believe that there was a caching problem. Which is when I decided to set caching and compressed-caching to be false as mentioned in the comments. However, even when disabling caching the issue persisted. I may be wrong but, I believe that somehow there was an issue with the repository itself.
I switched to a different git source context which only had the most essential files to be used by kaniko (dockerfile, nginx config files, etc.) and reduced the repository size to less than 100MB and this worked!
I still don't have the exact reason why kaniko was using a lot of memory to clone the files. I am still investigating that. I'll post it here once I find out probably by next week.