cachingsingularity-containerccache

How to utilize host caches in a singularity build?


I'm looking for ways to optimize the build time of our singularity HPC containers. I know that I can save some time by building them layer by layer. But still, there is room for optimization.

What I'm interested in is using/caching whatever makes sense on the host system.

  1. CCache for C++ build artifact caching
  2. git repo cloning
  3. APT package downloads

I did some experiments but haven't suceeded in any point.

What I found so far:

CCache

I install ccache in the container and instruct the build system to use it. I know that because I'm running singularity build with sudo, the cache would be under /root. But after running the build, /root/.ccache is empty. I verified the generated CMake build files, and they definitely use ccache.

I even created a test recipe containing a %post

touch "$HOME/.ccache/test"

but the test file did not appear anywhere on the host system (not in /root and not in my user's home). Does the build step mount a container-backed directory to /root instead of the host's root dir?

Is there something more needed to be done to utilize ccache?

Git

People suggest running e.g. git-cache-http-server (https://stackoverflow.com/a/43643622/1076564) and using git config --global url."http://gitcache:1234/".insteadOf https://.

Since singularity can read parts of the host filesystem, I think there could even be a way to have it working without a proxy program. However, if the host git repos are not inside $HOME or /tmp, how can singularity access them during build? singularity build has no --bind flag to specify additional mount directories. And using the %files section in recipe sounds inefficient - to copy everything each time the build is run.

APT

People suggest to use e.g. squid-deb-proxy (https://gist.github.com/dergachev/8441335). Again, since singularity is able to read host filesystem files, I'd like to just utilize the host's /var/cache/apt. But /var is not mounted to the container by default. So the same question again - how do I mount /var/cache/apt during container build time. And is it a good idea overall? Wouldn't it damage the APT cache of the host, given both host and container are based on the same version of Ubuntu and architecture?

Or does singularity do some clever APT caching itself? I've just noticed it downloaded 420 MB of packages in 25 seconds, which is possible on my connection, but not very probable given the standard speed of ubuntu mirrors.


Edit: I've created an issue on singularity repo: https://github.com/hpcng/singularity/issues/5352 .


Solution

  • It shows there is a way to utilize some caches on the host. As stated by one of the singularity developers, host's /tmp is mounted during the %post phase of build. And it is not possible to mount any other directory.

    So utilizing the host's caches is all about making the data accessible from /tmp.

    CCache

    Before running the build command, mount the ccache directory into /tmp:

    sudo mkdir /tmp/ccache
    sudo mount --bind /root/.ccache /tmp/ccache
    

    Then add the following line to your recipe's %post and you're done:

    export CCACHE_DIR=/tmp/ccache
    

    I'm not sure how sharing the cache with your user and not root would work, but I assume the documentation on sharing caches could help (especially setting umask for ccache).

    APT

    On the host, bind the apt cache dir:

    sudo mkdir /tmp/apt
    sudo mount --bind /var/cache/apt /tmp/apt
    

    In your %setup or %post, create container file /etc/apt/apt.conf.d/singularity-cache.conf with the following contents:

    Dir{Cache /tmp/apt}
    Dir::Cache /tmp/apt;
    

    Git

    The git-cache-http-server should work seamlessly - host ports should be accessible during build. I just did not use it in the end as it doesn't support SSH auth. Another way would be to manually clone all repos to /tmp and then clone in the build process with the --reference flag which should speed up the clone.