I often see this construct in sh scripts:
if [ "z$x" = z ]; then echo x is empty; fi
Why don't they just write it like this?
if [ "$x" = "" ]; then echo x is empty; fi
the z
in if [ "z$x" = z ]
is a "guard". it guarded mainly against old shell bugs.
the bugs have long been fixed. posix parsing rules and quotes make the guard largely unnecessary.
if you write your script for use on modern shells you are probably fine without the guard. but if your script should work on legacy unix systems then the guard is practically mandatory
in any case: quote your variables. the guard cannot protect against missing quotes.
first about the expression itself
this [ "z$x" = z ]
checks if $x
is empty
the guard also works if we test against other literals [ "z$x" = zliteral ]
it also works to test equality of two variables [ "z$var1" = "z$var2" ]
there is nothing special about z
. it is just a string. it could also be guard
: [ "guard$var" = guard ]
side note:
on modern shells to test if var is empty it is better to do like this: [ -z "$var" ]
you can also flip the literal and variable: [ literal = "$var" ]
but while this solves some problem it does not solve all problems.
about the old bugs
too many to list. the short version is that some old shells get confused when the variable expands to -f
or !
or other character with special meaning.
for example $x
is -f
then this [ "$x" = "" ]
becomes this [ "-f" = "" ]
some old shells were too eager in interpreting -f
as test for file existence. but then there are two more arguments while -f
expects just one argument. so the shell reports an error. but worse: carries on as if false. even if we actually did compare against -f
.
with the guard it becomes [ "z-f" = z ]
. the guard shields the problematic characters from getting misinterpreted.
these bugs have all been fixed. (fingers crossed)
posix parsing rules, if implemented correctly, prevent this confusion.
about posix parsing rules
this thing [ "$x" = "" ]
is a test with three arguments. (actually four with the closing ]
but that is practically syntax so we do not count that).
one of the posix parsing rules for three arguments state: if the middle argument is a comparison operator then the left and right arguments are the values to compare. does not matter if they are -f
or !
or (
.
so the posix parsing rules prevent this [ "-f" = "" ]
from being problematic.
posix parsing rules define how the arguments are to be interpreted from 0 arguments up to 4 arguments. the 4 argument variant is the 3 argument variant with an extra !
as first argument denoting negation.
it also states that for more than 4 arguments the result is unspecified. (also -a
and -o
are deprecated. as those are the only way we can get more than 4 arguments. but shells will likely still support -a
and -o
for a long time.)
that means as long as we stay within 4 arguments there is no danger of misinterpretation of strings like -f
.
but the posix parsing rules are only abstract rules. the shell must actually implement the rules. and the implementation might contain bugs.
about quotes
quotes prevent word splitting. word splitting might change number of arguments. posix parsing rules rely on number of arguments. so quotes are required for the posix parsing rules to be effective.
example with guard but without quote [ z$var1 = z$var2 ]
if $var1
comes from outside an attacker can assign the value = z -o z
(note the leading space)
the term becomes [ z = z -o z = z$var2 ]
. this always evaluates to true regardless of value of $var2
.
with quotes [ "z$var1" = "z$var2" ]
will become [ "z = z -o z" = "z$var2" ]
. this is still a three argument case according to posix parsing rules and thus unproblematic.
always quote your variables. quotes are super important and prevent not just this problem.
but: quotes cannot help against bugs misinterpreting -f
. because quoted "-f"
is still -f
. so quote and guard if you can not trust the shell implementation of posix parsing rules.
bonus: technical background
the shell if
syntax is roughly like this if command ; then command ; fi
(with optional else
part)
the conditional command is just a command. nothing special about it.
that means this part [ "z$x" = z ]
is a command. more specifically this [
is a command. the rest are the arguments.
test
is sort of an alias to [
. there is /usr/bin/test
but most shells have a builtin test
. the distinction does not matter here. what matters is that when the command gets executed the arguments undergo the typical variable expansions and word splittings.
by the time test
is running all it has are positional arguments. and those arguments might be a -f
or a =
. and other than the position there is no information attached if it is meant as a value or an operator.
posix parsing rules and quotes and guards are the things we do to help test
deduce the meaning of the positional arguments correctly. (with "we" i mean both the script programmer and the developers of the shell)
guards being the most banal: just tack a safe character in front to prevent things like -f
or !
from being confused as operators.
some shells like bash have advanced test constructs like [[
. which is not a command so the shell has more control how to treat expansion and special characters. but [[
is not posix and has its own set of peculiarities.
summary
old shells had problems if the variable expands to -f
or other special characters. with a guard it becomes x-f
and thus unproblematic.
posix parsing rules define how arguments should be interpreted and theoretically avoids all confusions. but implementations might still contain bugs.
quotes protect against word splitting and are necessary for posix parsing rules to be effective. but quotes do not protect against misinterpretation of -f
.
in modern shells all misinterpretation bugs have been fixed. (fingers crossed)
if your script should run on modern systems then you can skip the guard.
if your script should run on legacy systems then use the guard.
in any case: quote your variables.
links
https://www.in-ulm.de/~mascheck/various/test/ a quite thorough comparison of which shell had problems with which special character.
https://www.vidarholen.net/contents/blog/?p=1035 an overview of which bug got fixed when
http://mywiki.wooledge.org/BashPitfalls#A.5B_.24foo_.3D_.22bar.22_.5D about quoting and guarding
https://pubs.opengroup.org/onlinepubs/9799919799/utilities/test.html posix parsing rules
https://www.gnu.org/software/bash/manual/html_node/Shell-Expansions.html all the things that happen with the command line before the command gets executed
https://unix.stackexchange.com/questions/11454/what-is-the-difference-between-a-builtin-command-and-one-that-is-not difference builtin and system command
https://unix.stackexchange.com/questions/183745/why-is-a-shell-builtin-and-a-shell-keyword difference of [
and [[
regarding builtin or keyword
https://mywiki.wooledge.org/BashFAQ/031 difference [
and [[
on usage