I want the unique headers for a bunch of csv
files whose names contain ABC
or XYZ
.
Within a single directory, I can sort of get what I need with:
head -n ` *.csv > first.txt
cat -A first.txt | tr ',' '\n' | sort | uniq
Of course, this isn't recursive and it includes all csv
files, not just the ones I want.
If I do the following, I get the recursive search, but also a bunch of junk:
find . -type f -name "ABC*.csv" -o -name "XYZ*.csv" | xargs head -n 1 | tr ',' '\n' | sort | uniq
I'm on Windows 10 with MinGW64. I suppose I could use Python, but I feel so close to having it!
When head
is given multiple files (xargs
does that) it prints their names as well.
Using find
's -exec
action (you should force the precedence of -name 'ABC*.csv' -o -name 'XYZ*.csv
for it to work) you can obtain the desired result. uniq
is also not required here, sort can do that on its own. And as a sidenote, you better enclose literal strings in single quotes.
find . -type f \( -name 'ABC*.csv' -o -name 'XYZ*.csv' \) -exec head -n 1 {} \; | tr ',' '\n' | sort -u
If your files have DOS line endings above command will not work though. In that case you should delete carriage returns using tr
or sed
:
find . -type f \( -name 'ABC*.csv' -o -name 'XYZ*.csv' \) -exec head -n 1 {} \; | tr -d '\r' | tr ',' '\n' | sort -u
# or
find . -type f \( -name 'ABC*.csv' -o -name 'XYZ*.csv' \) -exec head -n 1 {} \; | sed 's/\r//; s/,/\n/g' | sort -u