gitgit-submodulesgit-plumbing

git plumbing command to get submodule remote


I am working with git plumbing and mirrored (and thus bare) repos, in a readonly filesystem.

I can see the existence of submodules with git ls-tree. I can infer their name/path and the SHA1, but I can't find a way to get the submodule remote.

The information is in there somewhere; if I clone the repo, git submodule init succeeds. (Making a clone is too expensive for normal use, particularly for very large repositories.) How can I get at the submodule remote directly?


Solution

  • Summary

    Start with git config --blob HEAD:.gitmodules --list and go from there. This requires Git versions >= 1.8.4. Note that HEAD can be any revision.

    Long form and explanation

    Comments turned into answer, with much of the answer provided by the OP. :-) Also, we have the following self-referential definitions: a superproject is a Git repository that has submodules, and a submodule (sometimes also called a subproject) is a Git repository controlled by a superproject. The submodule itself is normally kept checked-out at a specific commit (i.e., as a "detached HEAD"), although there are now special cases where you can direct Git to switch a submodule to a named branch. If a submodule has further submodules, the "outer" submodule is a superproject for the "inner" submodule, so super/sub is all relative.

    The submodules—the repository URLs and their checkout paths—are provided by a file named .gitmodules in the root directory of the superproject. Hence in a bare repository, you would obtain or extract the .gitmodules file. This file is formatted as a config file, so it is readable via git config --file.

    Since Git version 2.0, you can use the pseudo-name - to refer to stdin, so:

    git show HEAD:.gitmodules | git config --file - --list
    

    will dump the contents in a familiar format. (If your Git variant is older than that, but you have /dev/stdin, you can read /dev/stdin here.)

    It turns out that there is an even easier way, though: git config can, since Git version 1.8.4, read a blob straight out of the repository. The blob identifier is anything acceptable to git rev-parse, which can handle not only a branch name or commit ID, but even a subsequent path name. (This code went in specifically for submodule handling: see commit 1bc888193e1044db317a45b9a4c8d2b87b998f40.)

    Details

    Given a submodule path P, the name of the submodule is whichever entry has submodule.name.path set to P. The URL for that submodule is then submodule.name.url.

    It's possible to search out the desired name using git config --get-regexp. However, it's annoying at best since we must then quote pathname components that are regular expression meta-characters, with the obvious common one being .:

    $ git config --blob HEAD:.gitmodules \
        --get-regexp 'submodule\..*\.path' 'some/dir\.name/path'
    submodule.foo.path some/dir.name/path
    

    so it probably makes more sense just to dump out the configuration with --list and use something else to extract the interesting fields. For instance:

    git config --blob HEAD:.gitmodules --list | \
        awk -F= -vpath='some/dir.name/path' \
        '$1 ~ /submodule\..*\.path/ && $2 == path { split($1, a, "."); print a[2] }'
    

    (although by the time you put this into something that can read trees looking for gitlinks, you probably want Python or some such).