bashshellsh

Why test for equality in sh scripts in an indirect way?


I often see this construct in sh scripts:

if [ "z$x" = z ]; then echo x is empty; fi

Why don't they just write it like this?

if [ "$x" = "" ]; then echo x is empty; fi

Solution

  • the z in if [ "z$x" = z ] is a "guard". it guarded mainly against old shell bugs.

    the bugs have long been fixed. posix parsing rules and quotes make the guard largely unnecessary.

    if you write your script for use on modern shells you are probably fine without the guard. but if your script should work on legacy unix systems then the guard is practically mandatory

    in any case: quote your variables. the guard cannot protect against missing quotes.


    first about the expression itself

    this [ "z$x" = z ] checks if $x is empty

    the guard also works if we test against other literals [ "z$x" = zliteral ]

    it also works to test equality of two variables [ "z$var1" = "z$var2" ]

    there is nothing special about z. it is just a string. it could also be guard: [ "guard$var" = guard ]

    side note:

    on modern shells to test if var is empty it is better to do like this: [ -z "$var" ]

    you can also flip the literal and variable: [ literal = "$var" ] but while this solves some problem it does not solve all problems.


    about the old bugs

    too many to list. the short version is that some old shells get confused when the variable expands to -f or ! or other character with special meaning.

    for example $x is -f

    then this [ "$x" = "" ]

    becomes this [ "-f" = "" ]

    some old shells were too eager in interpreting -f as test for file existence. but then there are two more arguments while -f expects just one argument. so the shell reports an error. but worse: carries on as if false. even if we actually did compare against -f.

    with the guard it becomes [ "z-f" = z ]. the guard shields the problematic characters from getting misinterpreted.

    these bugs have all been fixed. (fingers crossed)

    posix parsing rules, if implemented correctly, prevent this confusion.


    about posix parsing rules

    this thing [ "$x" = "" ] is a test with three arguments. (actually four with the closing ] but that is practically syntax so we do not count that).

    one of the posix parsing rules for three arguments state: if the middle argument is a comparison operator then the left and right arguments are the values to compare. does not matter if they are -f or ! or (.

    so the posix parsing rules prevent this [ "-f" = "" ] from being problematic.

    posix parsing rules define how the arguments are to be interpreted from 0 arguments up to 4 arguments. the 4 argument variant is the 3 argument variant with an extra ! as first argument denoting negation.

    it also states that for more than 4 arguments the result is unspecified. (also -a and -o are deprecated. as those are the only way we can get more than 4 arguments. but shells will likely still support -a and -o for a long time.)

    that means as long as we stay within 4 arguments there is no danger of misinterpretation of strings like -f.

    but the posix parsing rules are only abstract rules. the shell must actually implement the rules. and the implementation might contain bugs.


    about quotes

    quotes prevent word splitting. word splitting might change number of arguments. posix parsing rules rely on number of arguments. so quotes are required for the posix parsing rules to be effective.

    example with guard but without quote [ z$var1 = z$var2 ]

    if $var1 comes from outside an attacker can assign the value = z -o z (note the leading space)

    the term becomes [ z = z -o z = z$var2 ]. this always evaluates to true regardless of value of $var2.

    with quotes [ "z$var1" = "z$var2" ] will become [ "z = z -o z" = "z$var2" ]. this is still a three argument case according to posix parsing rules and thus unproblematic.

    always quote your variables. quotes are super important and prevent not just this problem.

    but: quotes cannot help against bugs misinterpreting -f. because quoted "-f" is still -f. so quote and guard if you can not trust the shell implementation of posix parsing rules.


    bonus: technical background

    the shell if syntax is roughly like this if command ; then command ; fi (with optional else part)

    the conditional command is just a command. nothing special about it.

    that means this part [ "z$x" = z ] is a command. more specifically this [ is a command. the rest are the arguments.

    test is sort of an alias to [. there is /usr/bin/test but most shells have a builtin test. the distinction does not matter here. what matters is that when the command gets executed the arguments undergo the typical variable expansions and word splittings.

    by the time test is running all it has are positional arguments. and those arguments might be a -f or a =. and other than the position there is no information attached if it is meant as a value or an operator.

    posix parsing rules and quotes and guards are the things we do to help test deduce the meaning of the positional arguments correctly. (with "we" i mean both the script programmer and the developers of the shell)

    guards being the most banal: just tack a safe character in front to prevent things like -f or ! from being confused as operators.

    some shells like bash have advanced test constructs like [[. which is not a command so the shell has more control how to treat expansion and special characters. but [[ is not posix and has its own set of peculiarities.


    summary

    old shells had problems if the variable expands to -f or other special characters. with a guard it becomes x-f and thus unproblematic.

    posix parsing rules define how arguments should be interpreted and theoretically avoids all confusions. but implementations might still contain bugs.

    quotes protect against word splitting and are necessary for posix parsing rules to be effective. but quotes do not protect against misinterpretation of -f.

    in modern shells all misinterpretation bugs have been fixed. (fingers crossed)

    if your script should run on modern systems then you can skip the guard.

    if your script should run on legacy systems then use the guard.

    in any case: quote your variables.


    links

    https://www.in-ulm.de/~mascheck/various/test/ a quite thorough comparison of which shell had problems with which special character.

    https://www.vidarholen.net/contents/blog/?p=1035 an overview of which bug got fixed when

    http://mywiki.wooledge.org/BashPitfalls#A.5B_.24foo_.3D_.22bar.22_.5D about quoting and guarding

    https://pubs.opengroup.org/onlinepubs/9799919799/utilities/test.html posix parsing rules

    https://www.gnu.org/software/bash/manual/html_node/Shell-Expansions.html all the things that happen with the command line before the command gets executed

    https://unix.stackexchange.com/questions/11454/what-is-the-difference-between-a-builtin-command-and-one-that-is-not difference builtin and system command

    https://unix.stackexchange.com/questions/183745/why-is-a-shell-builtin-and-a-shell-keyword difference of [ and [[ regarding builtin or keyword

    https://mywiki.wooledge.org/BashFAQ/031 difference [ and [[ on usage