mavengradlebazel

How does Bazel fetch maven parent packages in parallel?


Typical way to use gradle is to download all needed packages in one step, then build in the next. But supposing you have hundreds of gradle projects, and don't want to download all needed packages in one step. I assume there is a way bazel can cache per package right? If so, how does it handle downloading packages in parallel that might have the same parent package? They could potentially try to write the same files at the same time and conflict right? The parents are not listed in the lock file, so without hand parsing the pom file, hard to know what order to download things in.


Solution

  • It's a two step process:

    1. Bazel doesn't know anything about pom.xml files, so it has to use an external Maven-aware tool to resolve the top level pom.xml files to produce a description of the full package graph. This can result in diamond dependencies, as you said. It's possible to extend Bazel with bzlmod to integrate with these tools. For example, rules_jvm_external is a bzlmod module that uses Aether to resolve the structure without downloading the artifacts.
    2. Now that Bazel knows about the graph structure, it can produce a BUILD file that mirrors the structure, like this:
    jvm_import(
      name = "org_hamcrest_hamcrest_core",
      visibility = ["//visibility:public"],
      tags = ["maven_coordinates=org.hamcrest:hamcrest-core:1.3", "maven_repository=https://maven.google.com", "maven_sha256=66fdef91e9739348df7a096aa384a5685f4e875584cce89386a7a47251c4d8e9", "maven_url=https://maven.google.com/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar"],
      jars = ["@maven//:v1/https/repo1.maven.org/maven2/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar"],
      deps = [],
    )
    
    jvm_import(
      name = "org_hamcrest_hamcrest_integration",
      visibility = ["//visibility:public"],
      tags = ["maven_coordinates=org.hamcrest:hamcrest-integration:1.3", "maven_repository=https://maven.google.com", "maven_sha256=70f418efbb506c5155da5f9a5a33262ea08a9e4d7fea186aa9015c41a7224ac2", "maven_url=https://maven.google.com/org/hamcrest/hamcrest-integration/1.3/hamcrest-integration-1.3.jar"],
      jars = ["@maven//:v1/https/repo1.maven.org/maven2/org/hamcrest/hamcrest-integration/1.3/hamcrest-integration-1.3.jar"],
      deps = ["@maven//:org_hamcrest_hamcrest_library"],
    )
    
    jvm_import(
      name = "org_hamcrest_hamcrest_library",
      visibility = ["//visibility:public"],
      tags = ["maven_coordinates=org.hamcrest:hamcrest-library:1.3", "maven_repository=https://maven.google.com", "maven_sha256=711d64522f9ec410983bd310934296da134be4254a125080a0416ec178dfad1c", "maven_url=https://maven.google.com/org/hamcrest/hamcrest-library/1.3/hamcrest-library-1.3.jar"],
      jars = ["@maven//:v1/https/repo1.maven.org/maven2/org/hamcrest/hamcrest-library/1.3/hamcrest-library-1.3.jar"],
      deps = ["@maven//:org_hamcrest_hamcrest_core"],
    )
    
    ...
    

    Bazel then analyzes this BUILD file to create its own internal graph structure, and depending on which build target is requested, it fetches the JARs as necessary while it traverses and evaluates the graph in a topological manner. With an in-memory graph evaluator and cache, a package that has many dependents will only be fetched and written to disk exactly once, so there's no possibility of files being clobbered.

    If you are interested to learn more, see Bazel's evaluation model.