gitgitignoregit-clean

Understanding .gitignore masking and git clean


I really have hard time to understand how .gitignore file works...

This is how my file looks like:

custom/history
cache
*.log
custom/modules/*/Ext
upload
sugar-cron*
custom/application/Ext
custom/Extenstion/modules/*/Ext/Language
!custom/modules/*/Language/cs_CZ.*
!custom/modules/*/Language/en_us.*
custom/Extenstion/application/Ext/Language
!custom/Extenstion/application/Ext/Language/cs_CZ.*
!custom/Extenstion/application/Ext/Language/en_US.*
.htaccess
config.php
config_override.php
files.md5

This is how my git status looked like:

apache@cb772759c68a sugarcrm$ git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#    LOG.txt
#    deploy_backup/
nothing added to commit but untracked files present (use "git add" to track)

So now I wanted to get rid of the two untracked files, but to my surprise a whole bunch of other files was removed too.

apache@cb772759c68a sugarcrm$ git clean -fd
Removing Disabled/upload:/
Removing LOG.txt
Removing custom/Extension/modules/Bugs/Ext/Language/
Removing custom/Extension/modules/Cases/Ext/Language/
Removing custom/Extension/modules/EmailAddresses/
Removing custom/Extension/modules/EmailParticipants/
Removing custom/Extension/modules/ForecastManagerWorksheets/
Removing custom/Extension/modules/ForecastWorksheets/
Removing custom/Extension/modules/Forecasts/
Removing custom/Extension/modules/Meetings/Ext/Layoutdefs/
Removing custom/Extension/modules/Meetings/Ext/WirelessLayoutdefs/
Removing custom/Extension/modules/Meetings/Ext/clients/
Removing custom/Extension/modules/ModuleBuilder/
Removing custom/Extension/modules/OutboundEmail/
Removing custom/Extension/modules/PdfManager/
Removing custom/Extension/modules/ProjectTask/Ext/Language/
Removing custom/Extension/modules/Quotas/
Removing custom/Extension/modules/Quotes/Ext/Dependencies/
Removing custom/Extension/modules/Targets/
Removing custom/Extension/modules/Tasks/Ext/Language/
Removing custom/Extension/modules/TimePeriods/
Removing custom/application/
Removing custom/install/
Removing custom/modules/Administration/
Removing custom/modules/Bugs/
Removing custom/modules/Cases/
Removing custom/modules/Contracts/
Removing custom/modules/Emails/
Removing custom/modules/HHP_Products/
Removing custom/modules/KBContents/
Removing custom/modules/Project/
Removing custom/modules/ProjectTask/
Removing custom/modules/ProspectLists/
Removing custom/modules/Prospects/
Removing custom/modules/Quotas/
Removing custom/modules/Reports/
Removing custom/modules/RevenueLineItems/
Removing custom/modules/Schedulers/
Removing custom/modules/Tags/
Removing custom/modules/Teams/
Removing custom/modules/hhp_assignment_zip/
Removing custom/modules/hhp_zipcode/
Removing custom/working/modules/Calls/
Removing custom/working/modules/Leads/clients/
Removing deploy_backup/
Removing deploy_log/
Removing dist/identity-provider/tests/docker/saml-test/config/simplesamlphp/config/
Removing vendor/sugarcrm/identity-provider/tests/docker/saml-test/config/simplesamlphp/config/

First point - The removed files were not shown after git status so obviously they were part of gitignore "mask"... Can anyone explain, how does any of these files match any of the patterns in gitignore? Like vendor/sugarcrm/identity-provider/tests/docker/saml-test/config/simplesamlphp/config/ ... Can anyone help me with building a propper gitignore?

Second point - I thought that .gitignore "protects" these unversioned files from git clean, that git literally does not take any action up on them. So obviously it does delete them... how can I not delete unversioned files while using git clean ?

EDIT: I confused git clean with git rm, I was talking about git clean the whole time

EDIT 2: it turned out, that the deleted directories which didn't match the .gitignore were "empty" after all. (they had subdirectories, but the directory tree was without any files...)


Solution

  • TL;DR

    You've mis-interpreted what git clean removes by default and with -d. (Note: I'm not a big fan of git clean myself; it's way too easy to have it remove precious files.)

    Long

    As phd notes, listing a file in .gitignore specifically disables, by default, having git clean clean it away. However, git clean is (significantly) more complicated than that. We'll get into this in a bit.

    First, though, let's address one peculiarity of .gitignore entries. If you already know all this (but nobody seems to :-) ) you can skip down to the git clean-specific sections below.

    1. A file that is tracked (is in the index right now) is never ignored, so that matching a .gitignore or equivalent (e.g., .git/info/exclude) pattern is irrelevant.

      The phrase is in the index right now means just that. When you use git add or git rm --cached to add or remove a file, that changes its tracked-ness. You can also use git ls-files --stage to dump out a complete list of every file in the index along with its staging data—mode, hash, and stage-slot-number—or without --stage to get just the names.

    2. A file (not a directory) that has been found by Git, that is not in the index right now, is untracked. Git does not store directories so directories never appear in the index.1 Tracked or untracked is purely a property of files.

    3. An untracked file can also be an ignored file. If so, git add won't add it, even if you name it explicitly on the command line (though you can both name it explicitly and use --force to add it).

      This means files (but not directories) fall into one of three categories: tracked, untracked (only), or untracked-and-ignored. This matters for git status, which only complains about untracked files (not untracked-and-ignored), but also in a moment for git clean as well.

    4. Last, when Git is doing a full directory-tree search / scan—as in git add . for instance—and encounters a directory that it might be able to skip (has no tracked files within it), Git will check whether the directory itself matches a .gitignore pattern, and if so, not look inside it. This speeds up git status and git add -A / git add . on such directories (sometimes enormously, if you can ignore an entire vendor tree or SDK for instance).

    Rule 4 is why, if you want to not ignore particular file paths that live underneath some directory path, you must instruct Git to specifically not-ignore the directory. If you ignore the directory, Git may never look inside the directory. This affects these three lines in particular:

    custom/Extenstion/application/Ext/Language
    !custom/Extenstion/application/Ext/Language/cs_CZ.*
    !custom/Extenstion/application/Ext/Language/en_US.*
    

    If you have ignored the entire directory custom/Extenstion/application/Ext/Language, Git won't look inside it and will never find any file matching custom/Extenstion/application/Ext/Language/cs_CZ.* to un-ignore it. It's therefore necessary to except the directory itself from ignored status: you should change the first line to read custom/Extenstion/application/Ext/Language/*, so that Git must look inside the directory. The subsequent lines ending with cs_CZ.* and en_US.* will override the ignored status for Czech and US-English files.


    1In fact, they can appear in the index, but only so as to be treated as special cases. git ls-files, which can show you the index contents, skips right over them.


    Using git clean -d clearly modifies Rule 4

    Git can only remove a directory if it's empty. This is a general OS-enforced rule: if a directory d contains some files d/f1, d/f2, and so on, and you were to remove d without removing the files first, you'd have a problem with the files. The system forces you to first remove the files within the directory. This applies to sub-directories as well: you can't remove d if d/sub exists even if d/sub is itself an empty directory. Only empty directories can be removed.

    Running git clean without -d not only leaves Rule 4 installed, but actually extends it. For instance, in the example we started with, Git notices that (1) custom/Extenstion/application/Ext/Language is a directory; (2) the directory matches an ignore pattern; so (3) provided there are no files in custom/Extenstion/application/Ext/Language that are already tracked, Git can and will skip the entire directory (and of course not remove it, since git clean is running without -d).

    Suppose that there's another directory named xyzzy/ that has no files listed in the index. This directory might be completely empty. In that case, there are no untracked files within it, by definition; so git clean without -d should do nothing to it. Or it might have files; these files are by definition untracked (and hence may be untracked-and-ignored), but you said not to remove directories, so git clean still doesn't even bother to look inside. This is the slightly odd case: Git often doesn't bother to look inside unknown directories.2 (You see this with git status as well: you have to use git status -uall to find the files inside a mystery directory. But git add -A or git add . has to look inside, unless the directory is ignored, which is why Rule 4 is a bit complicated in the general case.)

    Running with -d, though, apparently throws Rule 4 out completely. Again, in order to remove a directory, Git must first remove all the files within the directory. To do that, Git has to enumerate the contents as well. So if you tell git clean to use -d, it seems appropriate to disable Rule 4 entirely. The directory-ness of a path name will force Git to scan the directory's contents. Either we already needed to look inside because there are tracked files, or we need to look inside to remove files in order to remove the directory.


    2Note that "unknown" is not the same as "untracked". It's not even a Git term; I've made it up here. However, as we'll see, it might be nice if Git did define the phrase "untracked directory".


    What git clean removes

    Running git clean -n will show you what it would remove. This showing uses some shorthand: removing a directory implies removing all the files within that directory, including (recursively) sub-directories with sub-files. This is safer than running with -f instead of -n, since -f shows you what it did remove, the same way -n shows you what it would remove.

    By default, git clean removes files that are untracked, but not files that are untracked-and-ignored. That is, go back to point 3 above and look at the three classifications of files: git clean removes the middle classification (only). Adding -X (uppercase X) tells Git: don't remove untracked-only files; instead, remove untracked-and-ignored files.

    Adding -x tells Git: don't read the usual ignore-directives files such as .gitignore. At this point, no files will be ignored, so that (regardless of which files are tracked) no files can be untracked-and-ignored. Combining this with -X would make no sense,3 so git clean forbids you to use both -x and -X.

    Running git clean with -d adds empty-directory removal. Here, things get particularly squirrely, though. It seems as though Git's tracked, untracked, and untracked-and-ignored classification breaks down a bit. The documentation says that -d will:

    Remove untracked directories in addition to untracked files.

    But Git has no definition of untracked directories. "Tracked-ness" is exclusively a property of files. We did see, in a footnote, that directories sneak into the index as invisible entities (for purposes of speeding up various Git operations), but that doesn't really mean that directories are tracked.

    We can make one up: an "untracked directory" might be a directory that contains no tracked files. I think (but have not proven to my own satisfaction) that this definition works and explains git clean's behavior. It would help a lot if the Git documentation actually defined this properly, though.


    3Combining -x and -X with -e could have some practical uses, but Git still forbids this, at least as of today.