I intend to write a script that gathers files based on their filename prefix, and tar them together (when they share the same prefix). I have no list of the prefix, and I need to build it from the filenames themselves.
Files have names like:
top-1.parquet
top-2.parquet
side-1.parquet
side-2.parquet
bot-tom-1.parquet
bot-tom-2.parquet
right-left-1.parquet
right-left-2.parquet
To do so, I started with this script.
RMT_PATH_DATA='/home/me/Documents/code/data'
while IFS= read -r -d $'\n' root_name
do
# Work out tar here
echo "Working file $root_name"
ls "$root_name"*.parquet
done < <(find "$RMT_PATH_DATA" -maxdepth 1 -name "*.parquet" -print0 | rev | cut -f 2- -d '-' | rev | sort -zu)
(this script is more or less copied from the retained answer here on SO)
The logic of the last line is to revert the list of filenames retrieved with find
, and trim the digit of the filename and the prefix.
The trimming is made by first reversing the filename, using cut
starting on the 2nd field on reversed name (-
is the field delimiter, and it can be used a variable number of times in the prefix itself).
My trouble appears with the rev
and cut
commands.
The find
commands outputs the list of parquet files in the data
directory, but rev
and cut
appear processing only the 1st item of the list, discarding the other items.
Please, how can I make them processing the full list?
Thanks for your help! Bests
PS: I have not built the tar
part yet, and only do an echo
and ls
to check what is being processed in the loop. Only one iteration is currently done because of the raised trouble.
The problem is the -print0 option that you use in find. Then the delimiter between the found items is the NUL and not the newline. In How to concatenate files that have the same beginning of a name? the have used cut with the -z option, which is the corresponding of -print0. The rev command does not have an option to use the NUL delimiter as far as I can see.