bashfind-util

What would be the fastest way to find and remove files?


With:

Find="$(find / -name "*.txt" )"
du -hc "$Find" | tail -n1
echo "$Find" | xargs rm -r

If the file foo bar.txt is found, it won't count it with du or remove the file. What would be the best way to escape the spaces?


Solution

  • If none of your filenames can have embedded newlines (which would be very unusual), you can use the following:

    Note: To prevent accidental deletion of files while experimenting with the commands, I've replaced / as the input dir. (as used in the question) with /foo.

    # Read all filenames into a Bash array; embedded spaces in
    # filenames are handled correctly.
    IFS=$'\n' read -d '' -ra files < <(find /foo -name "*.txt")
    
    # Run the `du` command:
    du -hc "${files[@]}" | tail -1
    
    # Delete the files.
    rm -r "${files[@]}"
    

    Note that if you didn't need to collect all filenames ahead of time and don't mind running find twice, you can use a single find command for each task (except for piping to tail), which is also the most robust option (the only caveat is that if you have so many files that they don't fit on a single command line, du could be invoked multiple times).

    # The `du` command
    find /foo -name "*.txt" -exec du -hc {} + | tail -n1
    
    # Deletion.
    # Note that both GNU and BSD `find` support the `-delete` primary,
    # which supports deleting both files and directories.
    # However, `-delete` is not POSIX-compliant (a POSIX-compliant alternative is to
    # use `-exec rm -r {} +`).
    find /foo -name "*.txt" -delete
    

    Using + to terminate the command passed to -exec is crucial, as it instructs find to pass as many matches as will fit on a single command line to the target command; typically, but not necessarily, this results in a single invocation; effectively -exec ... + is like a built-in xargs, except that embedded whitespace in arguments is not a concern.

    In other words: -exec ... + is not only more robust than piping to xargs, but - due to not needing a pipeline and another utility - also more efficient.