gitmsysgit

git-cat-file output


When I ran git cat-file --batch on a commit, it output 'missing...', but the commit does exist. Why could this happen ? When cat-file was ran with -t switch, it just output 'commit' which is what I expected. Can any one explain this ? I am new to git. Thanks.

EDIT: I have figured out the cause. msysgit expects LF while ENTER generates CRLF.


Solution

  • I am not sure git cat-file --batch is supposed to work the way you mention in your question.
    (It might after git 2.8, March 2016, see below)

    Even in the "GitMagic book", in an unix environment, the git cat-file is used like sinelaw mentions in the comments:

    Check this file does indeed contain the above by typing:

    $ echo 05b217bb859794d08bb9e4f7f04cbda4b207fbe9 | git cat-file --batch
    

    As the OP Alex.Shen mentions above, this is an newline issue:
    git commands will alway expects LF (Line Feed, U+000A), not the Windows CRLF (CR+LF: CR (U+000D) followed by LF (U+000A)) sequence.
    With the '|', it uses the EOL character of the bash msysgit shell (LF), so it always works.


    Note: Git 2.5+ (Q2 2015) will add support for symlinks with git cat-file --batch.
    (New Git releases are available for windows at github.com/git-for-windows/git/releases)

    See commit 122d534 by David Turner (csusbdt), 20 May 2015.
    (Merged by Junio C Hamano -- gitster -- in commit 67f0b6f, 01 Jun 2015)

    cat-file: add --follow-symlinks to --batch

    "git cat-file --batch(-check)" learned the "--follow-symlinks" option that follows an in-tree symbolic link when asked about an object via extended SHA-1 syntax.

    E.g. HEAD:RelNotes that points at Documentation/RelNotes/2.5.0.txt.

    With the new option, the command behaves as if HEAD:Documentation/RelNotes/2.5.0.txt was given as input instead.


    Update February 2016:

    Git 2.8 add support for CRLF to some git commands:

    See commit a551843, commit 933bea9, commit 1536dd9, commit b42ca3d, commit 692dfdf, commit 3f16396, commit 18814d0, commit 1f3b1ef, commit 72e37b6, commit 6e8d46f, commit c0353c7 (28 Oct 2015) by Junio C Hamano (gitster).
    (Merged by Junio C Hamano -- gitster -- in commit 0175655, 03 Feb 2016)

    In particular, commit b42ca3d uses strbuf.c#strbuf_getline() (which can take a byte other than LF or NUL as the line terminator)

    With git 2.8:

    cat-file: read batch stream with strbuf_getline()

    It is possible to prepare a text file with a DOS editor and feed it as a batch command stream to the command.


    Note that before Git 2.33 (Q3 2021), "git cat-file --batch-all-objects"(man) misbehaved when --batch is in use and did not ask for certain object traits.

    See commit ee02ac6, commit e16acc8 (03 Jun 2021) by ZheNing Hu (adlternative).
    (Merged by Junio C Hamano -- gitster -- in commit 5d96bcb, 13 Jul 2021)

    cat-file: handle trivial --batch format with --batch-all-objects

    Helped-by: Jeff King
    Signed-off-by: ZheNing Hu
    Acked-by: Jeff King

    The --batch code to print an object assumes we found out the type of the object from calling oid_object_info_extended().
    This is true for the default format, but even in a custom format, we manually modify the object_info struct to ask for the type.

    This assumption was broken by 845de33 (cat-file: avoid noop calls to sha1_object_info_extended, 2016-05-18, Git v2.9.0-rc1 -- merge) (cat-file: avoid noop calls to sha1_object_info_extended, 2016-05-18).
    That commit skips the call to oid_object_info_extended() entirely when --batch-all-objects is in use, and the custom format does not include any placeholders that require calling it.

    Or when the custom format only include placeholders like %(objectname) or %(rest), oid_object_info_extended() will not get the type of the object.

    This results in an error when we try to confirm that the type didn't change:

    $ git cat-file --batch=batman --batch-all-objects
      batman 
      fatal: object 0000239 changed type!?  
    

    and also has other subtle effects (e.g., we'd fail to stream a blob, since we don't realize it's a blob in the first place).

    We can fix this by flipping the order of the setup.
    The check for "do we need to get the object info" must come after we've decided whether we need to look up the type.


    With Git 2.36 (Q2 2022), "git cat-file"(man) learns --batch-command mode, which is a more flexible interface than the existing "--batch" or "--batch-check" modes, to allow different kinds of inquiries made.

    See commit 440c705, commit 4cf5d53, commit ac4e58c, commit a2c7552 (18 Feb 2022) by John Cai (john-cai).
    (Merged by Junio C Hamano -- gitster -- in commit d169d51, 09 Mar 2022)

    cat-file: add --batch-command mode

    Helped-by: Ævar Arnfjörð Bjarmason
    Signed-off-by: John Cai

    Add a new flag --batch-command that accepts commands and arguments from stdin, similar to git-update-ref(man) --stdin.

    At GitLab, we use a pair of long running cat-file processes when accessing object content.
    One for iterating over object metadata with --batch-check, and the other to grab object contents with --batch.

    However, if we had --batch-command, we wouldn't need to keep both processes around, and instead just have one --batch-command process where we can flip between getting object info, and getting object contents.
    Since we have a pair of cat-file processes per repository, this means we can get rid of roughly half of long lived git cat-file(man) processes.
    Given there are many repositories being accessed at any given time, this can lead to huge savings.

    git cat-file --batch-command(man)

    will enter an interactive command mode whereby the user can enter in commands and their arguments that get queued in memory:

    <command1> [arg1] [arg2] LF 
    <command2> [arg1] [arg2] LF  
    

    When --buffer mode is used, commands will be queued in memory until a flush command is issued that execute them:

    flush LF  
    

    The reason for a flush command is that when a consumer process (A) talks to a git cat-file process (B) and interactively writes to and reads from it in --buffer mode, (A) needs to be able to control when the buffer is flushed to stdout.

    Currently, from (A)'s perspective, the only way is to either

    1. kill (B)'s process

    2. send an invalid object to stdin.

    3. is not ideal from a performance perspective as it will require spawning a new cat-file process each time, and 2. is hacky and not a good long term solution.

    With this mechanism of queueing up commands and letting (A) issue a flush command, process (A) can control when the buffer is flushed and can guarantee it will receive all of the output when in --buffer mode.
    --batch-command also will not allow (B) to flush to stdout until a flush is received.

    This patch adds the basic structure for adding command which can be extended in the future to add more commands.
    It also adds the following two commands (on top of the flush command):

    contents `<object>` LF 
    info `<object>` LF  
    

    The contents command takes an <object> argument and prints out the object contents.

    The info command takes an <object> argument and prints out the object metadata.

    These can be used in the following way with --buffer:

    info `<object>` LF 
    contents `<object>` LF 
    contents `<object>` LF 
    info `<object>` LF 
    flush LF 
    info `<object>` LF
    flush LF  
    

    When used without --buffer:

     info `<object>` LF 
     contents `<object>` LF 
     contents `<object>` LF 
     info `<object>` LF
     info `<object>` LF
    

    git cat-file now includes in its man page:

    --batch-command

    --batch-command=<format>

    Enter a command mode that reads commands and arguments from stdin. May only be combined with --buffer, --textconv or --filters. In the case of --textconv or --filters, the input lines also need to specify the path, separated by whitespace. See the section BATCH OUTPUT below for details.

    --batch-command recognizes the following commands:

    --

    contents <object>

    Print object contents for object reference <object>. This corresponds to the output of --batch.

    info <object>

    Print object info for object reference <object>. This corresponds to the output of --batch-check.

    flush

    Used with --buffer to execute all preceding commands that were issued since the beginning or since the last flush was issued. When --buffer is used, no output will come until a flush is issued. When --buffer is not used, commands are flushed each time without issuing flush.

    git cat-file now includes in its man page:

    When --batch-command is given, cat-file will read commands from stdin, one per line, and print information based on the command given. With --batch-command, the info command followed by an object will print information about the object the same way --batch-check would, and the contents command followed by an object prints contents in the same way --batch would.

    git cat-file now includes in its man page:

    If --batch is specified, or if --batch-command is used with the contents command, the object information is followed by the object contents (consisting of %(objectsize) bytes), followed by a newline.


    With Git 2.38 (Q3 2022), operating modes like "--batch" of "git cat-file"(man) command learned to take NUL-terminated input, instead of one-item-per-line.

    See commit db9d67f, commit 3639fef (22 Jul 2022) by Taylor Blau (ttaylorr).
    (Merged by Junio C Hamano -- gitster -- in commit 1e92768, 05 Aug 2022)

    builtin/cat-file.c: support NUL-delimited input with -z

    Signed-off-by: Taylor Blau

    When callers are using cat-file via one of the stdin-driven --batch modes, all input is newline-delimited.

    This presents a problem when callers wish to ask about, e.g. tree-entries that have a newline character present in their filename.

    To support this niche scenario, introduce a new -z mode to the --batch, --batch-check, and --batch-command suite of options that instructs cat-file to treat its input as NUL-delimited, allowing the individual commands themselves to have newlines present.

    The refactoring here is slightly unfortunate, since we turn loops like:

    while (strbuf_getline(&buf, stdin) != EOF)
    

    into:

    while (1) {
        int ret;
        if (opt->nul_terminated)
            ret = strbuf_getline_nul(&input, stdin);
        else
            ret = strbuf_getline(&input, stdin);
    
    if (ret == EOF)
        break;
        }
    

    It's tempting to think that we could use strbuf_getwholeline() and specify either \n or \0 as the terminating character.
    But for input on platforms that include a CR character preceeding the LF, this wouldn't quite be the same, since strbuf_getline(...) will trim any trailing CR, while strbuf_getwholeline(&buf, stdin, '\n') will not.

    git cat-file now includes in its man page:

    -z

    Only meaningful with --batch, --batch-check, or --batch-command; input is NUL-delimited instead of newline-delimited.


    With Git 2.42 (Q3 2023), "git cat-file --batch"(man) and friends learned -Z that uses NUL delimiter for both input and output.

    See commit f79e188, commit 3217f52, commit af35e56, commit b116c77, commit c7309f6 (06 Jun 2023) by Patrick Steinhardt (pks-t).
    (Merged by Junio C Hamano -- gitster -- in commit a9ea4c2, 22 Jun 2023)

    cat-file: add option '-Z' that delimits input and output with NUL

    Co-authored-by: Toon Claes
    Signed-off-by: Patrick Steinhardt

    In db9d67f ("builtin/cat-file.c: support NUL-delimited input with -z", 2022-07-22, Git v2.38.0-rc0 -- merge listed in batch #10), we have introduced a new mode to read the input via NUL-delimited records instead of newline-delimited records.
    This allows the user to query for revisions that have newlines in their path component.
    While unusual, such queries are perfectly valid and thus it is clear that we should be able to support them properly.

    Unfortunately, the commit only changed the input to be NUL-delimited, but didn't change the output at the same time.
    While this is fine for queries that are processed successfully, it is less so for queries that aren't.
    In the case of missing commits for example the result can become entirely unparsable:

    $ printf "7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10\n1234567890\n\n\commit000" |
       git cat-file --batch -z
    7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10
    1234567890
    commit missing 
    

    This is of course a crafted query that is intentionally gaming the deficiency, but more benign queries that contain newlines would have similar problems.

    Ideally, we should have also changed the output to be NUL-delimited when -z is specified to avoid this problem.
    As the input is NUL-delimited, it is clear that the output in this case cannot ever contain NUL characters by itself.
    Furthermore, Git does not allow NUL characters in revisions anyway, further stressing the point that using NUL-delimited output is safe.
    The only exception is of course the object data itself, but as git-cat-file(1) prints the size of the object data clients should read until that specified size has been consumed.

    But even though -z has only been introduced a few releases ago in Git v2.38.0, changing the output format retroactively to also NUL-delimit output would be a backwards incompatible change.
    And while one could make the argument that the output is inherently broken already, we need to assume that there are existing users out there that use it just fine given that revisions containing newlines are quite exotic.

    Instead, introduce a new option -Z that switches to NUL-delimited input and output.
    While this new option could arguably only switch the output format to be NUL-delimited, the consequence would be that users have to always specify both -z and -Z when the input may contain newlines.
    On the other hand, if the user knows that there never will be newlines in the input, they don't have to use either of those options.
    There is thus no usecase that would warrant treating input and output format separately, which is why we instead opt to "do the right thing" and have -Z mean to NUL-terminate both formats.

    The old -z option is marked as deprecated with a hint that its output may become unparsable.
    It is thus hidden both from the synopsis as well as the command's help output.

    git cat-file now includes in its man page:

    -Z

    Only meaningful with --batch, --batch-check, or --batch-command; input and output is NUL-delimited instead of newline-delimited.

    git cat-file now includes in its man page:

    newline-delimited. This option is deprecated in favor of -Z as the output can otherwise be ambiguous.

    git cat-file now includes in its man page:

    Alternatively, when -Z is passed, the line feeds in any of the above examples are replaced with NUL terminators. This ensures that output will be parsable if the output itself would contain a linefeed and is thus recommended for scripting purposes.