When I ran git cat-file --batch on a commit, it output 'missing...', but the commit does exist. Why could this happen ? When cat-file was ran with -t switch, it just output 'commit' which is what I expected. Can any one explain this ? I am new to git. Thanks.
EDIT: I have figured out the cause. msysgit expects LF while ENTER generates CRLF.
I am not sure git cat-file --batch
is supposed to work the way you mention in your question.
(It might after git 2.8, March 2016, see below)
Even in the "GitMagic book", in an unix environment, the git cat-file
is used like sinelaw mentions in the comments:
Check this file does indeed contain the above by typing:
$ echo 05b217bb859794d08bb9e4f7f04cbda4b207fbe9 | git cat-file --batch
As the OP Alex.Shen mentions above, this is an newline issue:
git commands will alway expects LF
(Line Feed, U+000A), not the Windows CRLF
(CR
+LF
: CR
(U+000D) followed by LF
(U+000A)) sequence.
With the '|
', it uses the EOL
character of the bash msysgit shell (LF
), so it always works.
Note: Git 2.5+ (Q2 2015) will add support for symlinks with git cat-file --batch
.
(New Git releases are available for windows at github.com/git-for-windows/git/releases
)
See commit 122d534 by David Turner (csusbdt
), 20 May 2015.
(Merged by Junio C Hamano -- gitster
-- in commit 67f0b6f, 01 Jun 2015)
cat-file
: add--follow-symlinks
to--batch
"
git cat-file --batch(-check)
" learned the "--follow-symlinks
" option that follows an in-tree symbolic link when asked about an object via extended SHA-1 syntax.E.g.
HEAD:RelNotes
that points atDocumentation/RelNotes/2.5.0.txt
.With the new option, the command behaves as if
HEAD:Documentation/RelNotes/2.5.0.txt
was given as input instead.
Update February 2016:
Git 2.8 add support for CRLF to some git commands:
See commit a551843, commit 933bea9, commit 1536dd9, commit b42ca3d, commit 692dfdf, commit 3f16396, commit 18814d0, commit 1f3b1ef, commit 72e37b6, commit 6e8d46f, commit c0353c7 (28 Oct 2015) by Junio C Hamano (gitster
).
(Merged by Junio C Hamano -- gitster
-- in commit 0175655, 03 Feb 2016)
In particular, commit b42ca3d uses strbuf.c#strbuf_getline()
(which can take a byte other than LF
or NUL
as the line terminator)
With git 2.8:
cat-file
: read batch stream withstrbuf_getline()
It is possible to prepare a text file with a DOS editor and feed it as a batch command stream to the command.
Note that before Git 2.33 (Q3 2021), "git cat-file --batch-all-objects
"(man) misbehaved when --batch
is in use and did not ask for certain object traits.
See commit ee02ac6, commit e16acc8 (03 Jun 2021) by ZheNing Hu (adlternative
).
(Merged by Junio C Hamano -- gitster
-- in commit 5d96bcb, 13 Jul 2021)
cat-file
: handle trivial--batch
format with--batch-all-objects
Helped-by: Jeff King
Signed-off-by: ZheNing Hu
Acked-by: Jeff King
The
--batch
code to print an object assumes we found out the type of the object from callingoid_object_info_extended()
.
This is true for the default format, but even in a custom format, we manually modify theobject_info
struct to ask for the type.This assumption was broken by 845de33 (
cat-file
: avoid noop calls to sha1_object_info_extended, 2016-05-18, Git v2.9.0-rc1 -- merge) (cat-file: avoid noop calls tosha1_object_info_extended,
2016-05-18).
That commit skips the call tooid_object_info_extended()
entirely when--batch-all-objects
is in use, and the custom format does not include any placeholders that require calling it.Or when the custom format only include placeholders like %(objectname) or %(rest),
oid_object_info_extended()
will not get the type of the object.This results in an error when we try to confirm that the type didn't change:
$ git cat-file --batch=batman --batch-all-objects batman fatal: object 0000239 changed type!?
and also has other subtle effects (e.g., we'd fail to stream a blob, since we don't realize it's a blob in the first place).
We can fix this by flipping the order of the setup.
The check for "do we need to get the object info" must come after we've decided whether we need to look up the type.
With Git 2.36 (Q2 2022), "git cat-file
"(man) learns --batch-command
mode, which is a more flexible interface than the existing "--batch
" or "--batch-check
" modes, to allow different kinds of inquiries made.
See commit 440c705, commit 4cf5d53, commit ac4e58c, commit a2c7552 (18 Feb 2022) by John Cai (john-cai
).
(Merged by Junio C Hamano -- gitster
-- in commit d169d51, 09 Mar 2022)
cat-file
: add --batch-command modeHelped-by: Ævar Arnfjörð Bjarmason
Signed-off-by: John Cai
Add a new flag
--batch-command
that accepts commands and arguments from stdin, similar togit-update-ref
(man) --stdin.At GitLab, we use a pair of long running cat-file processes when accessing object content.
One for iterating over object metadata with--batch-check
, and the other to grab object contents with --batch.However, if we had
--batch-command
, we wouldn't need to keep both processes around, and instead just have one--batch-command
process where we can flip between getting object info, and getting object contents.
Since we have a pair of cat-file processes per repository, this means we can get rid of roughly half of long livedgit cat-file
(man) processes.
Given there are many repositories being accessed at any given time, this can lead to huge savings.
git cat-file --batch-command
(man)will enter an interactive command mode whereby the user can enter in commands and their arguments that get queued in memory:
<command1> [arg1] [arg2] LF <command2> [arg1] [arg2] LF
When
--buffer
mode is used, commands will be queued in memory until a flush command is issued that execute them:flush LF
The reason for a flush command is that when a consumer process
(A)
talks to agit cat-file
process(B)
and interactively writes to and reads from it in--buffer
mode,(A)
needs to be able to control when the buffer is flushed to stdout.Currently, from (A)'s perspective, the only way is to either
kill
(B)
's processsend an invalid object to stdin.
is not ideal from a performance perspective as it will require spawning a new cat-file process each time, and 2. is hacky and not a good long term solution.
With this mechanism of queueing up commands and letting
(A)
issue a flush command, process(A)
can control when the buffer is flushed and can guarantee it will receive all of the output when in--buffer
mode.
--batch-command
also will not allow(B)
to flush to stdout until a flush is received.This patch adds the basic structure for adding command which can be extended in the future to add more commands.
It also adds the following two commands (on top of the flush command):contents `<object>` LF info `<object>` LF
The
contents
command takes an<object>
argument and prints out the object contents.The
info
command takes an<object>
argument and prints out the object metadata.These can be used in the following way with
--buffer
:info `<object>` LF contents `<object>` LF contents `<object>` LF info `<object>` LF flush LF info `<object>` LF flush LF
When used without
--buffer
:info `<object>` LF contents `<object>` LF contents `<object>` LF info `<object>` LF info `<object>` LF
git cat-file
now includes in its man page:
--batch-command
--batch-command=<format>
Enter a command mode that reads commands and arguments from stdin. May only be combined with
--buffer
,--textconv
or--filters
. In the case of--textconv
or--filters
, the input lines also need to specify the path, separated by whitespace. See the sectionBATCH OUTPUT
below for details.
--batch-command
recognizes the following commands:--
contents <object>
Print object contents for object reference
<object>
. This corresponds to the output of--batch
.
info <object>
Print object info for object reference
<object>
. This corresponds to the output of--batch-check
.
flush
Used with
--buffer
to execute all preceding commands that were issued since the beginning or since the last flush was issued. When--buffer
is used, no output will come until aflush
is issued. When--buffer
is not used, commands are flushed each time without issuingflush
.
git cat-file
now includes in its man page:
When
--batch-command
is given,cat-file
will read commands from stdin, one per line, and print information based on the command given. With--batch-command
, theinfo
command followed by an object will print information about the object the same way--batch-check
would, and thecontents
command followed by an object prints contents in the same way--batch
would.
git cat-file
now includes in its man page:
If
--batch
is specified, or if--batch-command
is used with thecontents
command, the object information is followed by the object contents (consisting of%(objectsize)
bytes), followed by a newline.
With Git 2.38 (Q3 2022), operating modes like "--batch
" of "git cat-file
"(man) command learned to take NUL-terminated input, instead of one-item-per-line.
See commit db9d67f, commit 3639fef (22 Jul 2022) by Taylor Blau (ttaylorr
).
(Merged by Junio C Hamano -- gitster
-- in commit 1e92768, 05 Aug 2022)
builtin/cat-file.c
: support NUL-delimited input with-z
Signed-off-by: Taylor Blau
When callers are using
cat-file
via one of the stdin-driven--batch
modes, all input is newline-delimited.This presents a problem when callers wish to ask about, e.g. tree-entries that have a newline character present in their filename.
To support this niche scenario, introduce a new
-z
mode to the--batch
,--batch-check
, and--batch-command
suite of options that instructscat-file
to treat its input as NUL-delimited, allowing the individual commands themselves to have newlines present.The refactoring here is slightly unfortunate, since we turn loops like:
while (strbuf_getline(&buf, stdin) != EOF)
into:
while (1) { int ret; if (opt->nul_terminated) ret = strbuf_getline_nul(&input, stdin); else ret = strbuf_getline(&input, stdin); if (ret == EOF) break; }
It's tempting to think that we could use
strbuf_getwholeline()
and specify either\n
or\0
as the terminating character.
But for input on platforms that include a CR character preceeding the LF, this wouldn't quite be the same, sincestrbuf_getline(...)
will trim any trailing CR, whilestrbuf_getwholeline(&buf, stdin, '\n')
will not.
git cat-file
now includes in its man page:
-z
Only meaningful with
--batch
,--batch-check
, or--batch-command
; input is NUL-delimited instead of newline-delimited.
With Git 2.42 (Q3 2023), "git cat-file --batch
"(man) and friends learned -Z that uses NUL delimiter for both input and output.
See commit f79e188, commit 3217f52, commit af35e56, commit b116c77, commit c7309f6 (06 Jun 2023) by Patrick Steinhardt (pks-t
).
(Merged by Junio C Hamano -- gitster
-- in commit a9ea4c2, 22 Jun 2023)
cat-file
: add option '-Z' that delimits input and output with NULCo-authored-by: Toon Claes
Signed-off-by: Patrick Steinhardt
In db9d67f ("
builtin/cat-file.c
: support NUL-delimited input with-z
", 2022-07-22, Git v2.38.0-rc0 -- merge listed in batch #10), we have introduced a new mode to read the input via NUL-delimited records instead of newline-delimited records.
This allows the user to query for revisions that have newlines in their path component.
While unusual, such queries are perfectly valid and thus it is clear that we should be able to support them properly.Unfortunately, the commit only changed the input to be NUL-delimited, but didn't change the output at the same time.
While this is fine for queries that are processed successfully, it is less so for queries that aren't.
In the case of missing commits for example the result can become entirely unparsable:$ printf "7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10\n1234567890\n\n\commit000" | git cat-file --batch -z 7ce4f05bae8120d9fa258e854a8669f6ea9cb7b1 blob 10 1234567890 commit missing
This is of course a crafted query that is intentionally gaming the deficiency, but more benign queries that contain newlines would have similar problems.
Ideally, we should have also changed the output to be NUL-delimited when
-z
is specified to avoid this problem.
As the input is NUL-delimited, it is clear that the output in this case cannot ever contain NUL characters by itself.
Furthermore, Git does not allow NUL characters in revisions anyway, further stressing the point that using NUL-delimited output is safe.
The only exception is of course the object data itself, but as git-cat-file(1) prints the size of the object data clients should read until that specified size has been consumed.But even though
-z
has only been introduced a few releases ago in Git v2.38.0, changing the output format retroactively to also NUL-delimit output would be a backwards incompatible change.
And while one could make the argument that the output is inherently broken already, we need to assume that there are existing users out there that use it just fine given that revisions containing newlines are quite exotic.Instead, introduce a new option
-Z
that switches to NUL-delimited input and output.
While this new option could arguably only switch the output format to be NUL-delimited, the consequence would be that users have to always specify both-z
and-Z
when the input may contain newlines.
On the other hand, if the user knows that there never will be newlines in the input, they don't have to use either of those options.
There is thus no usecase that would warrant treating input and output format separately, which is why we instead opt to "do the right thing" and have-Z
mean to NUL-terminate both formats.The old
-z
option is marked as deprecated with a hint that its output may become unparsable.
It is thus hidden both from the synopsis as well as the command's help output.
git cat-file
now includes in its man page:
-Z
Only meaningful with
--batch
,--batch-check
, or--batch-command
; input and output is NUL-delimited instead of newline-delimited.
git cat-file
now includes in its man page:
newline-delimited. This option is deprecated in favor of
-Z
as the output can otherwise be ambiguous.
git cat-file
now includes in its man page:
Alternatively, when
-Z
is passed, the line feeds in any of the above examples are replaced with NUL terminators. This ensures that output will be parsable if the output itself would contain a linefeed and is thus recommended for scripting purposes.