macosbashawk

bash & awk: Loop through dir running two separate awk commands on all files and saving in new dir


I asked a question yesterday and received wonderful help

I think I am getting the hang of using awk to solve the problem but I now need to automate some of the work and hope I can do this with bash and awk as well.

To recap from the other thread:

I am using a Mac and have a bunch of text files with no unique identifier tying records to each other. The only way to tie them together is by noting the position in text files and dealing with them before importing into a stats package.

The solution code is:

awk '/^AB1/{ab1=$0;next}/^AB2/{print $1,$2,ab1}' file01.txt > newfile01.txt

I was having issues appending the filename to position $7 in the output file, so I ran a second awk command and it worked:

awk '{print $1,$2,$3,$4,$5,$6,FILENAME}' newfile01.txt > newnewfile01.txt

What I would like to be able to do is point the script at the directory full of these files. It would ideally run both of the above commands on all *.txt and then save either to a new directory maintaining the same filename (if easier) or saving to the same directory with a new filename (ex: prepend 'new' to the filename).

The end result for me is that I will cat all of the new files into one massive txt file and import into the math programme. This imported file will now have the filename to help us ID where we got the row in the first place and we will have all information tying the records together on a single line/row, so we can analyze.

Thank you advance for any help/guidance.


Solution

  • Modifying your proposed solution so that it now iterates through the *txt files in the current directory:

    for f in *txt ; do awk '/^AB1/{ab1=$0;next}/^AB2/{print $1, $2, ab1}' "$f" > "new$f"; awk '{print $1,$2,$3,$4,$5,$6,FILENAME}' "new$f" > "newnew$f"; done
    

    But I suspect you want the filename of the first file, not the second file:

    for f in *txt ; do awk '/^AB1/{ab1=$0;next}/^AB2/{print $1, $2, ab1, FILENAME}' "$f" > "new$f"; done
    

    Finally, the following multi-line version of the first solution will help you understand what's going on:

    for f in *txt
    do
        awk '/^AB1/{ab1=$0;next}/^AB2/{print $1, $2, ab1}' "$f" > "new$f"
        awk '{print $1,$2,$3,$4,$5,$6,FILENAME}' "new$f" > "newnew$f"
    done
    

    You can try these and modify them according to your specific requirements.