I know that git LFS causes git to store a string "pointer" in a text file, and then git LFS downloads that target binary file. In this way, git repos are smaller on the remote git server. But, git LFS still has to store the binary files, so it seems to me that the storage locally (after a git lfs pull
) is no different, and the combined sum of the remote git LFS server data plus the remote git data would still be similar.
What am I missing? How does git LFS efficiently track binary files?
git lfs
. I now recommend against using git lfs
See also:
I began with this question because I believed Git LFS was amazing and wonderful and I wanted to know how. Instead, I ended up realizing Git LFS was the cause of my daily workflow problems and that I shouldn't use it nor recommend it anymore.
Summary:
For personal, free GitHub accounts, it is way too limiting, and for paid, corporate accounts, it makes
git checkout
go from taking a few seconds to up to 3+ hours, especially for remote workers, which is a total waste of their time. I dealt with that for three years and it was horrible. I wrote a script to do agit lfs fetch
once per night to mitigate this, but my employer refused to buy me a bigger SSD to give me enough space to dogit lfs fetch --all
once per night, so I still ran into the multi-hour-checkout problem frequently. It's also impossible to undo the integration ofgit lfs
into your repo unless you delete your whole GitHub repo and recreate it from scratch.
Details:
I just discovered that the free version of git lfs
has such strict limits that it's useless, and I'm now in the process of removing it from all my public free repos. See this answer (Repository size limits for GitHub.com) and search for the "git lfs" parts.
It seems to me that the only benefit of git lfs
is that it avoids downloading a ton of data all at once when you clone a repo. That's it! That seems like a pretty minimal, if not useless, benefit for any repo which has a total content size (git repo + would-be git lfs repo) < 2 TB or so. All that using git lfs
does is
git checkout
take forever (literally hours) (bad)git checkout
now become online-and-slow git commands (bad), andIf you're trying to use git lfs
to overcome GitHub's 100 MB max file size limit, like I was, don't! You'll run out of git lfs
space almost instantly, in particular if anyone clones or forks your repo, as that counts against your limits, not theirs! Instead, "a tool such as tar
plus split
, or just split
alone, can be used to split a large file into smaller parts, such as 90 MB each" (source), so that you can then commit those binary file chunks to your regular git
repo.
Lastly, the "solution" on GitHub to stop using git lfs
and totally free up that space again is absolutely crazy nuts! You have to delete your entire repo! See this Q&A here: How to delete a file tracked by git-lfs and release the storage quota?
GitHub's official documentation confirms this (emphasis added):
After you remove files from Git LFS, the Git LFS objects still exist on the remote storage and will continue to count toward your Git LFS storage quota.
To remove Git LFS objects from a repository, delete and recreate the repository. When you delete a repository, any associated issues, stars, and forks are also deleted.
I can't believe this is even considered a "solution". I really hope they're working on a better fix for it.
git lfs
:Quick summary: don't use git lfs
. Buy your employees bigger SSDs instead. If you do end up using git lfs
, buy your employees bigger SSDs anyway, so they can run a script to do git lfs fetch --all
once per night while they are sleeping.
Details:
Let's say you're a tech company with a massive mono-repo that is 50 GB in size, and binary files and data that you'd like to be part of the repo which are 4 TB in size. Rather than giving them insufficient 500 GB ~ 2 TB SSDs and then resorting to git lfs
, which makes git checkout
s go from seconds to hours when done on home internet connections, get your employees bigger solid state drives instead! A typical tech employee costs you > $1000/day (5 working days per week x 48 working weeks/year x $1000/day = $240k, which is less than their salary + benefits + overhead costs). So, a $1000 8 TB SSD is totally worth it if it saves them hours of waiting and hassle! Examples to buy:
Now they will hopefully have enough space to run git lfs fetch --all
in an automated nightly script to fetch LFS contents for all remote branches to help mitigate (but not solve) this, or at least git lfs fetch origin branch1 branch2 branch3
to fetch the contents for the hashes of their most-used branches.
git lfs
[even for remote repos]: Do I need Git LFS for local repos?git lfs post-checkout
hook after failed git checkout
git lfs fetch
, git lfs fetch --all
, and git lfs pull
?When you clone a Git repository, you have to download a compressed copy of its entire history. Every version of every file is accessible to you.
With Git LFS, the file data are not stored in the repository, so when you clone the repository it does not have to download the complete history of the files stored in LFS. Only the "current" version of each LFS file is downloaded from the LFS server. Technically, LFS files are downloaded during "checkout" rather than "clone."
So Git LFS is not as much about storing large files efficiently as it is about avoid downloading unneeded versions of selected files. That history is often not very interesting anyway, and if you need an older version, Git can connect to the LFS server and get it. This is by contrast to regular Git which lets you checkout any commit offline.