I am working with git plumbing and mirrored (and thus bare) repos, in a readonly filesystem.
I can see the existence of submodules with git ls-tree
. I can infer their name/path and the SHA1, but I can't find a way to get the submodule remote.
The information is in there somewhere; if I clone the repo, git submodule init
succeeds. (Making a clone is too expensive for normal use, particularly for very large repositories.) How can I get at the submodule remote directly?
Start with git config --blob HEAD:.gitmodules --list
and go from there. This requires Git versions >= 1.8.4. Note that HEAD
can be any revision.
Comments turned into answer, with much of the answer provided by the OP. :-) Also, we have the following self-referential definitions: a superproject is a Git repository that has submodules, and a submodule (sometimes also called a subproject) is a Git repository controlled by a superproject. The submodule itself is normally kept checked-out at a specific commit (i.e., as a "detached HEAD"), although there are now special cases where you can direct Git to switch a submodule to a named branch. If a submodule has further submodules, the "outer" submodule is a superproject for the "inner" submodule, so super/sub is all relative.
The submodules—the repository URLs and their checkout paths—are provided by a file named .gitmodules
in the root directory of the superproject. Hence in a bare repository, you would obtain or extract the .gitmodules
file. This file is formatted as a config file, so it is readable via git config --file
.
Since Git version 2.0, you can use the pseudo-name -
to refer to stdin, so:
git show HEAD:.gitmodules | git config --file - --list
will dump the contents in a familiar format. (If your Git variant is older than that, but you have /dev/stdin
, you can read /dev/stdin
here.)
It turns out that there is an even easier way, though: git config
can, since Git version 1.8.4, read a blob straight out of the repository. The blob identifier is anything acceptable to git rev-parse
, which can handle not only a branch name or commit ID, but even a subsequent path name. (This code went in specifically for submodule handling: see commit 1bc888193e1044db317a45b9a4c8d2b87b998f40.)
Given a submodule path P, the name of the submodule is whichever entry has submodule.name.path
set to P. The URL for that submodule is then submodule.name.url
.
It's possible to search out the desired name using git config --get-regexp
. However, it's annoying at best since we must then quote pathname components that are regular expression meta-characters, with the obvious common one being .
:
$ git config --blob HEAD:.gitmodules \
--get-regexp 'submodule\..*\.path' 'some/dir\.name/path'
submodule.foo.path some/dir.name/path
so it probably makes more sense just to dump out the configuration with --list
and use something else to extract the interesting fields. For instance:
git config --blob HEAD:.gitmodules --list | \
awk -F= -vpath='some/dir.name/path' \
'$1 ~ /submodule\..*\.path/ && $2 == path { split($1, a, "."); print a[2] }'
(although by the time you put this into something that can read trees looking for gitlinks, you probably want Python or some such).