githttpcurlls-remote

How to manually do `git ls-remote` with a http client


I want to query the latest remote HEAD commit for 10000+ https git repositories, every hour. Basically this x10000:

remote=https://github.com/torvalds/linux
git ls-remote $remote HEAD | awk '{print $1}'

Most, but not all, remotes are on a single server (github.com). I do not want to use the github API, because some repos are not on github, and because the API has limits.

Hence I want to use the git https remote protocol, but I would prefer to implement this with (lib)curl instead of git to get more control over the https settings, and hopefully do requests in parallel over the same connection.

Where can I find more information about what http request git ls-remote is making under the hood (using the "smart" git protocol), such that I can perform the same call with libcurl?

I had a look at HTTP transfer protocols spec and docs on the Git-Internals-Transfer-Protocols but this is very generic, and doesn't go into the details of ls-remote.


Solution

  • I think the Discover references section of the http protocol documentation is what you want.

    If you're interacting with GitHub, you need to use the "smart" protocol, because:

    $ curl https://github.com/docker/docker.git/info/refs                         Please upgrade your git client.
    GitHub.com no longer supports git over dumb-http: https://github.com/blog/809-git-dumb-http-transport-to-be-turned-off-in-90-days
    

    So, following the documentation, we need to run:

    $ curl https://github.com/docker/docker.git/info/refs'?service=git-upload-pack'
    

    This produces binary output, which curl will by default not display on your terminal. If we dump it to a file (-o refs.txt) and then inspect the file, we see we have almost exactly the output of git ls-remote.

    Compare:

    $ git ls-remote https://github.com/docker/docker.git
    [lars@madhatter python]$ git ls-remote https://github.com/docker/docker.git | head -5
    235f86270d4976e7d17c11eccdb65f81d76f5c40        HEAD
    175f1829377413b1887a0c38232b1cda975fd71c        refs/heads/1.12.x
    473c5701cb66403b0535a5c01845cb0f27fbeb47        refs/heads/1.13.x
    ceb9e244d934d87104b7e4e0032f1d389e47fd64        refs/heads/17.03.x
    a1e8b2ede880ffa159c72b4d62c827475fbff531        refs/heads/17.04.x
    

    And:

    $ curl  -s https://github.com/docker/docker.git/info/refs'?service=git-upload-pack' | head-7
    001e# service=git-upload-pack
    00000156235f86270d4976e7d17c11eccdb65f81d76f5c40 HEADmulti_ack thin-pack side-band side-band-64k ofs-delta shallow deepen-since deepen-not deepen-relative no-progress include-tag multi_ack_detailed allow-tip-sha1-in-want allow-reachable-sha1-in-want no-done symref=HEAD:refs/heads/master filter object-format=sha1 agent=git/github-gf942d1d040ff
    003f175f1829377413b1887a0c38232b1cda975fd71c refs/heads/1.12.x
    003f473c5701cb66403b0535a5c01845cb0f27fbeb47 refs/heads/1.13.x
    0040ceb9e244d934d87104b7e4e0032f1d389e47fd64 refs/heads/17.03.x
    0040a1e8b2ede880ffa159c72b4d62c827475fbff531 refs/heads/17.04.x
    004089658bed64c2a8fe05a978e5b87dbec409d57a0f refs/heads/17.05.x
    

    There's some protocol data there you would need to decode based on the documentation, but otherwise this provides you with the same list of references as git ls-remote.


    Most servers should support the "smart" protocol.