rgitgithooksr-styler

After pre-commit hook: git add makes staged files disappear


I'm trying to setup a local pre-commit hook. It should check if any edited files are styled according to the tidyverse styleguide (using styler). Because my company does not allow for direct access to github, I cannot use precommit, and have to setup the hook by editing the .git/hooks/pre-commit file.

Setup

This is what I tried: My .git/hooks/pre-commit

#!/bin/bash

set -eo pipefail

CHANGED_FILES=$(git diff --name-only --cached --diff-filter=ACMR)

get_pattern_files() {
    pattern=$(echo "$*" | sed "s/ /\$\\\|/g")
    echo "$CHANGED_FILES" | { grep "$pattern$" || true; }
}

R_FILES=$(get_pattern_files .R)

# if R_FILES is not empty, run Rscript
if [[ -n "$R_FILES" ]]
then
    Rscript ./style.R $R_FILES
fi

# exit with 1, if Rscript failed
if [ $? -eq 0 ]; then
    exit 0
else
    exit 1
fi

and my ./style.R

#!/usr/bin/env Rscript
args <- commandArgs(trailingOnly = TRUE)
output <- styler::style_file(path = args)
if (any(output$changed) == TRUE) {quit(status = 1)}

Problem

When I edit a file, I can see it in the diff.

user@machine:~/r_template$ git diff
diff --git a/src/main.R b/src/main.R
index 8d2f097..dd1272d 100644
--- a/src/main.R
+++ b/src/main.R
@@ -1 +1 @@
-1 + 1
+1 +1           <-- this is what I have changed in the file

I add it with git add -u and then git commit. The hook gets called, and aborts the commit (because the Rscript exits with status 1) as expected.

user@machine:~/r_template$ git commit
Styling  1  files:
 src/main.R ℹ
────────────────────────────────────────
Status  Count   Legend
✓       0       File unchanged.
ℹ       1       File changed.
x       0       Styling threw an error.
────────────────────────────────────────
Please review the changes carefully!

and can see an (expected) edited file

user@machine:~/r_template$ git status
On branch feature/precommit-hooks
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        modified:   src/main.R

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   src/main.R

with the intended changes done by styler::style_file()

user@machine:~/r_template$ git diff
diff --git a/src/main.R b/src/main.R
index dd1272d..8d2f097 100644
--- a/src/main.R
+++ b/src/main.R
@@ -1 +1 @@
-1 +1
+1 + 1       <-- this is done by styler::style_file()

Ok, so next I want to stage this edit, so I git add -u. But then something happens I cannot understand

utxkanal@vlx00950:~/r_template$ git status
On branch feature/precommit-hooks
nothing added to commit but untracked files present (use "git add" to track)

The staging area is completely empty. And still, the change in src/main.R is in effect. What is happening here?


Solution

  • The trick to understanding what's going on is this: The staging area is never empty, and never contains any changes. To which the obvious reply is what the f—, you obviously don't know what you're talking about because git status shows an empty staging area and git diff shows changes. And yes, they do. The trick is that these are strategically developed lies.

    As Emily Dickinson's poem Tell all the Truth but tell it slant points out, trying to instantly see the whole picture all at once is overwhelming. The whole truth of Git's index / staging-area is that it contains a full copy of every file, ready to be committed. That is, there are at all times three copies of the current commit:

    The "in-between" version is simply a full copy of every file in Git's staging area, which Git also calls the index.

    When git diff or git diff --cached shows changes, what Git is doing is playing the Spot the Difference game with two snapshots. There are two full pictures, but the two have some differences. Git omits, from a diff output, all the stuff that is the same, because that stuff is boring! We want to know what's different.

    The same thing happens with git status, except that instead of showing us what's different, it tells us which file names have one or more differences. So this:

    user@machine:~/r_template$ git status
    On branch feature/precommit-hooks
    Changes to be committed:
      (use "git reset HEAD <file>..." to unstage)
    
            modified:   src/main.R
    

    means that when Git compares the full snapshot of every file in the HEAD (current) commit to the full snapshot of every file in the staging area, one file is different: src/main.R. Meanwhile, this second part:

    Changes not staged for commit:
      (use "git add <file>..." to update what will be committed)
      (use "git checkout -- <file>..." to discard changes in working directory)
    
            modified:   src/main.R
    

    means that when Git (separately) compares the full snapshot in the staging area to the full snapshot in your working tree, one file is different: src/main.R (again).

    If we now change the index (aka staging) copy of src/main.R—we're allowed to replace it wholesale at any time, and formatting hooks do that sort of trick—then the difference from HEAD-to-index changes. Perhaps it even disappears entirely! And, separately, the difference from index-to-working-tree changes as well.

    In this particular case, what happened is that the frozen, archived, HEAD copy and the working tree copy already matched. Only the index/staging-area copy was different. Running git add src/main.R told Git: replace the index copy with the working tree copy. It did that, and now all three copies matched again.

    Footnote: "The staging area is never empty, and never contains any changes" is obviously an overstatement. It's also wrong because it is possible to have a completely-empty staging area, by removing every file from it. It's just not normal to do that. The only time you normally have that is in a fresh, new, totally-empty repository, when you're working on your first commit. But it's still a good way to remember that the index / staging-area has a full snapshot of every file. In fact, the point of this is that the full snapshot in the index area is, in effect, the snapshot you're proposing to put into your next commit.

    (There's also the much trickier case of multiple index / staging-areas.)