.netdockernuget

Parallel Docker build with dotnet restore thrashes each other's caches


tl;dr: can multiple .NET applications be built concurrently on the same NuGet cache?

I have a Visual Studio Solution containing multiple libraries and multiple applications to be hosted in Docker. I want my builds to be as fast as possible, so I involve as much caching as possible. This is achieved in two ways: first, copy only the necessary .csproj files into their respective directories, to have this layer cached as long as I don't edit any .csproj. Can't use wildcards for this copy, because we need BuildKit for the next step.

Then use --mount=type=cache,id=nuget,target=/root/.nuget/packages before each RUN instruction to have a reusable NuGet cache on the host between builds:

FROM mcr.microsoft.com/dotnet/sdk:6.0 AS build-env
WORKDIR /app

# Copy all required (transitive) project references
COPY src/SampleApp.Common/SampleApp.Common.csproj ./SampleApp.Common/SampleApp.Common.csproj
COPY src/SampleApp.Data/SampleApp.Data.csproj ./SampleApp.Data/SampleApp.Data.csproj

# Then copy ourselves
COPY src/SampleApp.Api/SampleApp.Api.csproj ./SampleApp.Api/SampleApp.Api.csproj

# Restore as distinct layers
# Mount NuGet cache dir as cache in Docker for faster subsequent builds.
RUN --mount=type=cache,id=nuget,target=/root/.nuget/packages \
    dotnet restore SampleApp.Api --runtime linux-x64

# Copy all required (transitive) project sources
COPY src/SampleApp.Common ./SampleApp.Common
COPY src/SampleApp.Data ./SampleApp.Data

# Then copy ourselves
COPY src/SampleApp.Api SampleApp.Api

# Build and publish the release (needs cached packages to copy on build)
RUN --mount=type=cache,id=nuget,target=/root/.nuget/packages \
    dotnet publish SampleApp.Api \
    --no-restore \
    --runtime linux-x64 \
    --self-contained false \
    --configuration Release \
    --output ./Publish/SampleApp.Api/

FROM mcr.microsoft.com/dotnet/aspnet:6.0
WORKDIR /app
COPY --from=build-env /app/Publish/SampleApp.Api .
ENTRYPOINT ["dotnet", "SampleApp.Api.dll"]

Now this works, even though maintaining the Dockerfile for larger projects becomes a hassle (adding a new library project requires editing multiple lines in multiple Dockerfiles), but there's a problem when multiple projects need to be built at the same time.

The command:

docker compose up -d --build

Having multiple projects with Dockerfiles like this, shows, after a while:

=> ERROR [sampleapp_api build-env 16/30] RUN --mount=type=cache,id=nuget,target=/root/.nuget/packages dotnet restore SampleApp.Api --runtime linux-x64

#12 34.56 /usr/share/dotnet/sdk/6.0.302/NuGet.targets(130,5): error : Could not find file '/root/.nuget/packages/microsoft.aspnetcore.app.runtime.linux-x64/6.0.7/fgp1y1vi.2tu'. [/app/SampleApp.Api/SampleApp.Api.csproj]

On the second run two projects fail at the same time:

/usr/share/dotnet/sdk/6.0.302/NuGet.targets(130,5): error : Could not find file '/root/.nuget/packages/runtime.any.system.runtime.interopservices/4.1.0/q0z4zwua.tf3 /usr/share/dotnet/sdk/6.0.302/NuGet.targets(130,5): error : Could not find file '/root/.nuget/packages/runtime.any.system.runtime.interopservices/4.1.0/xpvko1tv.2o3

Probably because the third project is restoring that library at that exact time. This is because multiple builds use the same NuGet cache:

--mount=type=cache,id=nuget,target=/root/.nuget/packages

And because NuGet extracts the package's contents with a random filename on every build, clearing what's already in there, on parallel builds, one build clears the other build's NuGet cache, causing the aforementioned build (restore) error.

Now there's a couple of workarounds, neither of which I like:

  1. Have each project have its own NuGet cache (--mount=type=cache,id=nuget-sampleapp-api, ...,id=nuget-sampleapp-web, ...). This will blow up the caches on disk, and a few Docker builds will happily gobble up a few GB already. NVMe space is expensive.
  2. When running into this error, build the project images manually one by one (docker compose build sampleapp_api, ... sampleapp_web, ...). I'll have to do this for every application image every time a shared project gets updated. Or script it. More scripts, more maintenance, more non-standard build steps, don't like it. And it beats the purpose of a parallelized build.
  3. Run the build a couple of times. Yeah but no.

Any other suggestions?


Solution

  • After struggeling with this exact problem for a long time, i finally figured out why this does not work and how to fix it.

    This comment lead me to the solution: https://github.com/NuGet/Home/issues/7060#issuecomment-732065148

    So if there are concurrent restore operations running, which use different temp paths but they share the same global packages cache, the cache is not really protected with the locking mechanism at all

    NuGet uses a temp folder (/tmp/NuGetScratch) which you can read about here: https://docs.microsoft.com/en-us/nuget/consume-packages/managing-the-global-packages-and-cache-folders mounting this temp directory along with the global-packages cache fixes the issue.

    The exact issue seems to be that NuGet uses the temp directory for a locking mechanism between restore processes, mounting the temp directory allows this to work when restoring in parallel from Docker.

    At this point i just mount all the NuGet cache folders like this when doing NuGet restores:

    RUN \
      --mount=type=cache,target=/root/.nuget/packages \
      --mount=type=cache,target=/root/.local/share/NuGet/http-cache \
      --mount=type=cache,target=/root/.local/share/NuGet/plugin-cache \
      --mount=type=cache,target=/tmp/NuGetScratchroot \
      dotnet restore