We use github to manage a great deal of our software environment, and I would wager that like many other orgs the overwhelming majority of traffic to/from that repo comes from our office. With that in mind, is there a way to build a local cache of a given github repository, but still have the protection of the cloud version? I'm thinking of this in the model of a caching proxy server, where the local server (presumably in our building, on our local network) would handle the vast majority of cloning/pull operations.
This seems like it should be doable, but searching for this has been very difficult, I think in no small part because the words "local" and "cache" have overloaded meanings especially for git(hub) questions.
You should check out the git-cache-http-server project. I think it partly implements what you need (and is similar to the idea from @larsks post).
It is a NodeJS piece of software that runs an HTTP server to provide you access to locally cached git repositories. The server automatically does fetch upstream changes when required. If you use those local git repositories instead of the distant ones, your git client will be served locally cached content.
If you run the git-cache-http-server on a separate host (VM or container for example), you can configure your local git client to automatically clone and fetch from the cache by configuring it to replace https://github.com
with something like http://gitcache/github.com
. This can be achieved by a configuration like:
git config --global url."http://gitcache:1234/".insteadOf https://
At the moment, this software only provides a cache to clone and update a repository, there is no provision for pushing changes back. For some use cases, thinking about a CI infrastructure that needs to pull content of multiple repositories even when only a single one has changed or the automated testing you mention, this can be useful.