awkrecent-file-list

Process the latest version of a file ..... as the 2nd file of a pair of files that are to be processed


Using gawk I want to process two files in a directory. The first file has a fixed name but whilst the start of the name of the second file is constant the name ends in a date and time stamp, the latter changes everytime the file is created. I want to use the latest version of the second file.

I have seen a post/answer to a similar but less complicated question at how to pass the most recent file from a directory to awk input file? and the code ls -lr 2nd_file_*| tail -n 1 does show me the latest file. However I do not know how to pass the found file name to gawk as the second file.

Currently I type the date/time stamp into the gawk script e.g.

gawk -F[,"\t""}"] '{ do something }' file_1 2nd_file_2024_03_21_[18-21-32] > output_file

Does anyone know how I can do this ? Thanks.

I haven't tried anything as I haven't a clue how to.


Solution

  • Setting aside the various issues with parsing 'ls' output one simple approach would see the 2nd file/argument (to the awk script) replaced with a subshell invocation of the ls|tail call, eg:

    awk '{ do something }' file_1 $( ls -1r 2nd_file_* | tail -n 1 )
    

    NOTE: OP has stated this particular ls|tail combo provides the desired file name so I'm merely copying it here as an example.


    To see this in action we'll start with some sample files:

    $ head *
    ==> 2nd_file_2024_03_21 <==
    21
    
    ==> 2nd_file_2024_03_22 <==
    22
    
    ==> 2nd_file_2024_03_23 <==
    23
    
    ==> 2nd_file_2024_03_24 <==
    24
    
    ==> file_1 <==
    line_1
    

    To obtain the latest 2nd_file_* we need a tweak to OP's current ls|tail:

    $ ls -1 2nd_file_* | tail -n 1
    2nd_file_2024_03_24
    

    Wrapping this in subshell invocation and feeding to a simple awk script that prints each input line to stdout:

    $ awk '{ print }' file_1 $( ls -1 2nd_file_* | tail -n 1 )
    line_1                                                       # line from file_1
    24                                                           # line from 2nd_file_2024_03_24