linuxbashshelldu

Handling special characters in bash script


I'm not familiar with bash scripting. Maybe this is a silly question. But I couldn't find the answer. I'm working on a bash script that mimics the behavior of the command ls -sh but that actually uses du -sh to get file and folder sizes. And it sorts the output. pretty much like du -sh* | sort -h with colors.

#!/usr/bin/bash

if [ "$#" = "0" ]
then
    du -sh *|awk -f /path/to/color-ls.awk|sort -h
else
    du -sh $@|awk -f /path/to/color-ls.awk|sort -h
fi

where ls-color.awk is:

# color-ls.awk
size=$1;
name=$2;
for (i=3; i<=NF; i++)
{
    tmp=(name " " $i);
    name=tmp
}
# filename=($0 ~ /'/)? ("\"" name "\""):("'" name "'")
filename=("'" name "'")
printf $1 " "
cmd=("ls -d " filename " --color")
system(cmd)

an awk script that uses ls --color to color the output of du -sh

My scripts works fine with most file names even ones containing spaces. but it has some problems involving special characters that I didn't know how to fix.

1. When run without arguments:

It is interpreting any file name that contains single quotes causing an error

sh: 1: Syntax error: Unterminated quoted string

2. When run with arguments:

The same problem as without arguments. And it's interpreting a file name with spaces as two names.

example: when used on a folder named VirtualBox VMs or when given * as an argument in my home directory here's it's output:

du: cannot access 'VirtualBox': No such file or directory
du: cannot access 'VMs': No such file or directory

3. What I want:

I want the script to skip special characters and pass them as they are to du

4. What I tried:

I tried adding double quotes before and after each file name

parse(){
    for arg in $@
    do
        printf "\"$arg\"\n"
    done
}

but it didn't seem to work. du doesn't accept quotes appended to the file name.

du: cannot access '"VirtualBox': No such file or directory
du: cannot access 'VMs"': No such file or directory

Also, replacing quotes with \' doesn't help ether. maybe I'm just doing it wrong.

# du -sh $(printf "file'name\n" |sed "s/'/\\\'/g")
du: cannot access 'file\'\''name': No such file or directory
# ls file\'name 
"file'name"

Same goes for spaces

du: cannot access 'VirtualBox\': No such file or directory
du: cannot access 'VMs': No such file or directory

5. Extra:

I'm trying to make the script works as normal ls -sh would work but with sorted output and with more accurate results when it comes to folders. but this script works like ls -sh -d when arguments are supplied to it. making lh Desktop shows the size of Desktop instead of the size of the individual files and folders inside Desktop. I believe this can be fixed with a loop that checks if each argument is a file or a folder and execute du -sh accordingly then sort.

#!/usr/bin/bash

if [ "$#" = "0" ]
then
    du -sh *|awk -f /path/to/color-ls.awk|sort -h
else
    for i in $@
    do
        if [[ -d "$i" ]]; then
            du -sh $i/* |awk -f /path/to/color-ls.awk
        else
            du -sh "$i" |awk -f /path/to/color-ls.awk
        fi
    done|sort -h
fi

I'm hoping to find the optimal way to do it.

Thanks in advance.


Solution

  • Please do not post so much in one question. Please one problem per question. One script per question, etc.

    Make sure to check your scripts with shellcheck. It will catch your mistakes. See https://mywiki.wooledge.org/Quotes .

    1. When run without arguments:

    filename=("'" name "'") inside awk script is a invalid way to pass anything with ' quotes to system() call, so you are getting unterminated ' error, as expected, because there will be 3 ' characters. Fix the AWS script, or better rewrite it in Bash, no need for awk. Maybe rewrite it all in Python or Perl.

    Moreover, tmp=(name " " $i); deletes tabs and multiple spaces from filenames. It's all meant to work with only nice filenames.

    The script will break on newlines in filenames anyway.

    1. When run with arguments:

    $@ undergoes word splitting and filename expansion (topics you should research). Word splitting splits the input into words on spaces. Use "$@". Quote the expansions.

    1. What I want:

    You'll be doing that with "$@"

    1. What I tried:

    The variable content is irrelevant. You have to change the way you use the variable, not it's content. I.e. use quotes around the use of the variable. Not the content.

    1. Extra:

    You did not quote the expansion. Use "$i" not $i. It's "$i"/*. $1 undergoes word splitting.


    And finally, after that all, your script may look like, with GNU tools:

    if (($# == 0)); then
       set -- *
    fi
    du -hs0 "$@" |
    sort -zh |
    sed -z 's/\t/\x00/' |
    while IFS= read -r -d '' size && IFS= read -r -d '' file; do
       printf "%s " "$size";
       ls -d "$file"
    done
    

    Also see How can I find and safely handle file names containing newlines, spaces or both? https://mywiki.wooledge.org/BashFAQ/001 .

    Also, you can chain any statements:

    if stuff; then
       stuff1
    else
       stuff2
    fi | 
    sort -h |
    awk -f yourscriptrt 
    

    And also don't repeat yourself - use bash arrays:

    args=()
    if stuff; then
      args=(*)
    else
      args=("$@")
    fi
    du -hs "${args[@]}" | stuff...
    

    And so that sort has less work to do, I would put it right after du, not after parsing.