I have a text file that contains something like this:
abc 123, comma
the quick brown fox
jumped over the lazy dog
comma, comma
I wrote a script
for i in `cat file`
do
echo $i
done
For some reason, the output of the script doesn't output the file line by line but breaks it off at the commas, as well as the newline. Why is cat
or for blah in `cat xyz`
doing this and how can I make it NOT do this? I know I can use a
while read line
do
blah balh blah
done < file
but I want to know why cat
or the for var in
is doing this to further my understanding of Unix commands. cat
's man page didn't help me and looking at for
or looping in the bash manual didn't yield any answers (http://www.gnu.org/software/bash/manual/bashref.html). Thanks in advance for your help.
The problem is not in cat
, nor in the for
loop per se; it is in the use of back quotes. When you write either:
for i in `cat file`
or (better):
for i in $(cat file)
or (in ksh
, zsh
or bash
¹):
for i in $(<file)
the shell executes the command and captures the output as a string, removes trailing newline characters (and all NULs with bash) separating the words at the characters in $IFS
and (except in zsh) performs globbing aka filename generation aka pathname expansion on the resulting words. If you want lines input to $i
, you either have to fiddle with IFS
or use the while
loop. The while
loop is better if there's any danger that the files processed will be large; it doesn't have to read the whole file into memory all at once, and doesn't perform globbing and doesn't skip empty lines unlike the versions using $(...)
.
IFS='
'
set -o noglob # disable globbing
for i in $(<file)
do printf '%s\n' "$i"
done
The quotes around the "$i"
are generally a good idea. In this context, with the modified $IFS
, and globbing disabled, it actually isn't critical, but good habits are good habits even so. printf
is better than echo
, as echo
would output nothing or an empty line for input lines containing -n
, -nene
, -eee
or depending on the echo
implementation and/or environment mangle backslashes. That matters in the following script:
old="$IFS"
IFS='
'
set -o noglob
for i in $(<file)
do
(
IFS="$old"
set +o noglob
printf '%s\n' "$i"
)
done
when the data file contains tabulations or multiple spaces (both of which are in the default value of $IFS
) or wildcards or leading trailing whitespace
$ cat file
abc 123
foo
-Enee
/e* /b*
$
Output:
$ sh bq.sh
abc 123
foo
-Enee
/e* /b*
$
With echo
and without the double quotes:
$ cat bq.sh
old="$IFS"
IFS='
'
set -o noglob
for i in $(<file)
do
(
IFS="$old"
set +o noglob
echo $i
)
done
$ sh bq.sh
abc 123
foo
/etc /bin /boot
$
For the while read
loop, the syntax should be:
while IFS= read -r line
do
printf '%s\n' "$line"
done < file
-r
, read
would mangle backslashesIFS=
, read
would remove leading and trailing space and tabs (assuming the default value of $IFS
).printf
should be used instead of echo
, and $line
quoted for the same reasons as above.¹ Though in bash it's much less of an optimisation as bash still forks a child process to perform the expansion.