bashtr

Strange bash regex behaviour with [:upper:] and [:lower:] and directories


Using some regex expressions such as [:upper:] and [:lower:] in tr and grep can return results I can't explain if the current working directory contains directories consisting of single-character names 'l' and 'p'.

Using bash, start in an empty directory, and use tr to convert from upper to lower case:

mkdir test
cd test
echo upper | tr [:upper:] [:lower:]
upper
echo UPPER | tr [:upper:] [:lower:]
upper

All good, and as expected. Now again with a couple of empty folders, first 'l' and then additionally with 'p':

mkdir l
echo upper | tr [:upper:] [:lower:]
upper

mkdir p
echo upper| tr [:upper:] [:lower:]
uller

echo UPPER | tr [:upper:] [:lower:]
UPPER

There's clearly some file globbing going on here that only happens in some limited circumstances with single-character folder names. Can anyone explain this behaviour?

The solution is simple enough, enclose the regex terms in quotes, but I'd be really interested in an explanation for the behaviour.

echo upper | tr '[:upper:]' '[:lower:]'
upper

Solution

  • As you point out, there is some globbing going on. The shell treats [:upper:] as a glob. It is the same glob as [epru:] and attempts to match a single character name of e, p, r, u, or :. If such a name exists in the current directory, then [:upper:] is replaced with all of the names that match. If no such names exist (and shell options are appropriate), then no expansion takes place and the string is left unchanged. (So, in that case it is different than the glob [epru:]).

    When no name matches the glob, tr sees the arguments [:upper:] and [:lower:]. If the name l is in the current directory, tr sees the arguments [:upper:] and l, so all upper case letters get changed to l. If the names p and l exist, tr sees the arguments p and l, and just changes all p to l. If the name r exists (as well as p and l), then [:upper:] expands to p r, [:lower:] expands to l r and tr sees 4 arguments: p, r, l, and r and tr complains about extra operands.

    But note that the shell's behavior may also be influenced by things like GLOBIGNORE, extglob, dotglob, failglob, nullglob, nocaseglob, etc. Rather than worrying about the details of such things, it is safest to quote the strings so the shell does not attempt to expand the glob.