linuxawk

How to iterate over all folders and their subfolders and have AWK process each TXT file in the subfolders?


I want to iterate over all folders and their subfolders and print the names of the .TXT files (in the subfolders) whose first line contains the string CYCLE DATE (there may be spaces and/or underscores between CYCLE and DATE). Here's my attempt at solving this:

In files_and_folders.sh I entered this:

#!/bin/bash
find . -name '*.TXT' -exec awk 'NR == 1 && $0 ~ /CYCLE[_ ]+DATE/ { print FILENAME }'

At the bash command line I entered this:

bash files_and_folders.sh

That produced the following error message:

find: missing argument to -exec

What is the correct way to do this?


Solution

  • I'd split this problem like this:

    1. Go over all files
    2. for each file:
      1. get the first line only
      2. check for CYCLE DATE
      3. print file name if found.

    So,

    #!/bin/bash
    # Don't error on no file name matches:
    shopt -s nullglob
    # Enable recursive ** glob:
    shopt -s globstar
    
    for file in **/*.TXT ; do
      # first line only   # look for regex              # print file name
      #                   #  -q:   silently             #
      # -n 1: one line    #  -E: extended regexes       #
      head -n 1 "${file}" | grep -q -E 'CYCLE[_ ]+DATE' && echo "${file}"
      # or your elegant:
      # awk 'NR == 1 && $0 ~ /CYCLE[_ ]+DATE/ { print FILENAME }' "${file}"
    done
    

    Of course, instead of grep you can use awk to analyze your line, but frankly, that's unnecessarily complex here. Your regular expression is very simple (CYCLE, then "space" (at least once), then DATE), so a simple regex engine like grep can do the job.


    The problem with your find is that you use neither ';' nor '{}' after -exec, so find can't understand where the command it should execute is done (or where it should put the file it found when doing the invocation).

    But since this doesn't even need find and can be done completely without, I'd personally say for file in GLOB; do … done is easier to remember than find -name 'PATTERN' -exec Some complicated syntax '{}' ';'.