Imagine I have a directory containing many subdirectories each containing some number of CSV files with the same structure (same number of columns and all containing the same header).
I am aware that I can run from the parent folder something like
find ./ -name '*.csv' -exec cat {} \; > ~/Desktop/result.csv
And this will work fine, expect for the fact that the header is repeated each time (once for each file).
I'm also aware that I can do something like sed 1d <filename>
or tail -n +<N+1> <filename>
to skip the first line of a file.
But in my case, it seems a bit more specialised. I want to preserve the header once for the first file and then skip the header for every file after that.
Is anyone aware of a way to achieve this using standard Unix tools (like find, head, tail, sed, awk etc.) and bash?
For example input files
/folder1
/file1.csv
/file2.csv
/folder2
/file1.csv
Where each file has header:
A,B,C
and each file has one data row 1,2,3
The desired output would be:
A,B,C
1,2,3
1,2,3
1,2,3
I feel this is different to other questions like this and this specifically because those solutions reference file1 and file2 in the solution. My question asks about a directory structure with an arbitrary number of files where I would not want to type out each file one by one.
You may use this find + xargs + awk
:
find . -name '*.csv' -print0 | xargs -0 awk 'NR==1 || FNR>1'
NR==1 || FNR>1
condition will be true for very first line in combined output or for every non-first line.