I'm looking for ways to optimize the build time of our singularity HPC containers. I know that I can save some time by building them layer by layer. But still, there is room for optimization.
What I'm interested in is using/caching whatever makes sense on the host system.
I did some experiments but haven't suceeded in any point.
What I found so far:
CCache
I install ccache in the container and instruct the build system to use it. I know that because I'm running singularity build
with sudo, the cache would be under /root
. But after running the build, /root/.ccache
is empty. I verified the generated CMake build files, and they definitely use ccache.
I even created a test recipe containing a %post
touch "$HOME/.ccache/test"
but the test file did not appear anywhere on the host system (not in /root
and not in my user's home). Does the build step mount a container-backed directory to /root
instead of the host's root dir?
Is there something more needed to be done to utilize ccache?
Git
People suggest running e.g. git-cache-http-server (https://stackoverflow.com/a/43643622/1076564) and using git config --global url."http://gitcache:1234/".insteadOf https://
.
Since singularity can read parts of the host filesystem, I think there could even be a way to have it working without a proxy program. However, if the host git repos are not inside $HOME
or /tmp
, how can singularity access them during build? singularity build
has no --bind
flag to specify additional mount directories. And using the %files
section in recipe sounds inefficient - to copy everything each time the build is run.
APT
People suggest to use e.g. squid-deb-proxy (https://gist.github.com/dergachev/8441335). Again, since singularity is able to read host filesystem files, I'd like to just utilize the host's /var/cache/apt
. But /var
is not mounted to the container by default. So the same question again - how do I mount /var/cache/apt
during container build time. And is it a good idea overall? Wouldn't it damage the APT cache of the host, given both host and container are based on the same version of Ubuntu and architecture?
Or does singularity do some clever APT caching itself? I've just noticed it downloaded 420 MB of packages in 25 seconds, which is possible on my connection, but not very probable given the standard speed of ubuntu mirrors.
Edit: I've created an issue on singularity repo: https://github.com/hpcng/singularity/issues/5352 .
It shows there is a way to utilize some caches on the host. As stated by one of the singularity developers, host's /tmp
is mounted during the %post
phase of build. And it is not possible to mount any other directory.
So utilizing the host's caches is all about making the data accessible from /tmp
.
Before running the build command, mount the ccache directory into /tmp
:
sudo mkdir /tmp/ccache
sudo mount --bind /root/.ccache /tmp/ccache
Then add the following line to your recipe's %post
and you're done:
export CCACHE_DIR=/tmp/ccache
I'm not sure how sharing the cache with your user and not root
would work, but I assume the documentation on sharing caches could help (especially setting umask
for ccache).
On the host, bind the apt cache dir:
sudo mkdir /tmp/apt
sudo mount --bind /var/cache/apt /tmp/apt
In your %setup
or %post
, create container file /etc/apt/apt.conf.d/singularity-cache.conf
with the following contents:
Dir{Cache /tmp/apt}
Dir::Cache /tmp/apt;
The git-cache-http-server
should work seamlessly - host ports should be accessible during build. I just did not use it in the end as it doesn't support SSH auth. Another way would be to manually clone all repos to /tmp
and then clone in the build process with the --reference
flag which should speed up the clone.