I have file names like this:
/foo/bar/bazz/JMA01023D_E07/JMA01023D_E07_EKDL230054768-1A_22HFKNLT3_L4_1.fq.gz
/foo/bar/bazz/JMA01023D_E08/JMA01023D_E08_EKDL230054768-1A_22HFKNLT3_L4_1.fq.gz
/foo/bar/bazz/JMA01023D_E09/JMA01023D_E09_EKDL230054768-1A_22HFKNLT3_L4_1.fq.gz
/foo/bar/bazz/JMA01022D_E06/JMA01022D_E06_EKDL230054767-1A_22HF2WLT3_L7_1.fq.gz
/foo/bar/bazz/JMA01001D_A01/JMA01001D_A01_EKDL230054750-1A_222T7MLT4_L1_1.fq.gz
/foo/bar/bazz/JMA01001D_A02/JMA01001D_A02_EKDL230054750-1A_222T7MLT4_L1_1.fq.gz
3 of these files (full path, sorted alphabetically) form a triplet. I would like to get the parent folder name for 3 files at a time.
So the desired output would be:
JMA01001D_A01 JMA01001D_A02 JMA01022D_E06
JMA01023D_E07 JMA01023D_E08 JMA01023D_E09
Something like this:
find "$@" -iname '*_1.fq.gz' | sort | xargs -I % -n3 sh -c echo % | sed -r 's/ *[^ ]*\/([^ ]+)\/([^ ]+)/\1 /g\'
And ideally, I would like to support spaces, so something with find -print0
, sort -z
and xargs -0
would be ideal.
But I just can't seem to get it to work.
Could someone please help me untangle my brain?
It doesn't have to use sed, something with dirname
/basename
or awk
would be fine as well...
You can use awk to get the folder name and pipe that into xargs -n 3
to get the output with 3 items per line:
... | awk -F'/' '{print $(NF-1)}' | xargs -n 3
So if I place your input in /tmp/foo
and run the following:
sort /tmp/foo | awk -F'/' '{print $(NF-1)}' | xargs -n 3
The output is
JMA01001D_A01 JMA01001D_A02 JMA01022D_E06
JMA01023D_E07 JMA01023D_E08 JMA01023D_E09