I want to query the latest remote HEAD commit for 10000+ https git repositories, every hour. Basically this x10000:
remote=https://github.com/torvalds/linux
git ls-remote $remote HEAD | awk '{print $1}'
Most, but not all, remotes are on a single server (github.com). I do not want to use the github API, because some repos are not on github, and because the API has limits.
Hence I want to use the git https remote protocol, but I would prefer to implement this with (lib)curl
instead of git
to get more control over the https settings, and hopefully do requests in parallel over the same connection.
Where can I find more information about what http request git ls-remote
is making under the hood (using the "smart" git protocol), such that I can perform the same call with libcurl?
I had a look at HTTP transfer protocols spec and docs on the Git-Internals-Transfer-Protocols but this is very generic, and doesn't go into the details of ls-remote
.
I think the Discover references section of the http protocol documentation is what you want.
If you're interacting with GitHub, you need to use the "smart" protocol, because:
$ curl https://github.com/docker/docker.git/info/refs Please upgrade your git client.
GitHub.com no longer supports git over dumb-http: https://github.com/blog/809-git-dumb-http-transport-to-be-turned-off-in-90-days
So, following the documentation, we need to run:
$ curl https://github.com/docker/docker.git/info/refs'?service=git-upload-pack'
This produces binary output, which curl
will by default not display on your terminal. If we dump it to a file (-o refs.txt
) and then inspect the file, we see we have almost exactly the output of git ls-remote
.
Compare:
$ git ls-remote https://github.com/docker/docker.git
[lars@madhatter python]$ git ls-remote https://github.com/docker/docker.git | head -5
235f86270d4976e7d17c11eccdb65f81d76f5c40 HEAD
175f1829377413b1887a0c38232b1cda975fd71c refs/heads/1.12.x
473c5701cb66403b0535a5c01845cb0f27fbeb47 refs/heads/1.13.x
ceb9e244d934d87104b7e4e0032f1d389e47fd64 refs/heads/17.03.x
a1e8b2ede880ffa159c72b4d62c827475fbff531 refs/heads/17.04.x
And:
$ curl -s https://github.com/docker/docker.git/info/refs'?service=git-upload-pack' | head-7
001e# service=git-upload-pack
00000156235f86270d4976e7d17c11eccdb65f81d76f5c40 HEADmulti_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed allow-tip-sha1-in-want allow-reachable-sha1-in-want no-done symref=HEAD:refs/heads/master filter object-format=sha1 agent=git/github-gf942d1d040ff
003f175f1829377413b1887a0c38232b1cda975fd71c refs/heads/1.12.x
003f473c5701cb66403b0535a5c01845cb0f27fbeb47 refs/heads/1.13.x
0040ceb9e244d934d87104b7e4e0032f1d389e47fd64 refs/heads/17.03.x
0040a1e8b2ede880ffa159c72b4d62c827475fbff531 refs/heads/17.04.x
004089658bed64c2a8fe05a978e5b87dbec409d57a0f refs/heads/17.05.x
There's some protocol data there you would need to decode based on the documentation, but otherwise this provides you with the same list of references as git ls-remote
.
Most servers should support the "smart" protocol.