linuxshellawkfind

How to make Gawk work with files found by the “find” command with the corresponding output of the “-printf” option available as a variable?


I want to do the following:

  1. Find a particular set of files with the find command;
  2. For any found file, put the corresponding output of the -printf option to a variable called str and pass it to Gawk (and do not print/use that output anywhere else);
  3. Execute a Gawk program for the corresponding file. The contents of the corresponding str variable must be available in the program.

For example, I have the directory called /d/ir. It contains two files, file1.txt and file2.txt. The files are in the UTF-8 encoding. The file whose name is file1.txt contains the following two lines of text:

A
BC 

The file size is 4 bytes.

The file whose name is file2.txt contains the following three lines of text:

D
EF
GHI

The file size is 8 bytes.

I want to print all these lines, appending the corresponding contents of str (file name, file size) to each line. So the expected output is

A;d/ir/file1.txt,4
BC;d/ir/file1.txt,4
D;d/ir/file2.txt,8
EF;d/ir/file2.txt,8
GHI;d/ir/file2.txt,8

I tried the following command:

LC_ALL=en_US.utf8; find "/d/ir" -name "file*.txt" -type f -printf "%p,%s" -execdir gawk -v str="$7" '{
print($0 ";" str)
}' "{}" \+

(Here I was hoping that $7, being a positional parameter, would refer to "%p,%s") But it does not print the expected output: it shows two outputs of -printf (which I do not want to happen), then five lines without the needed data from str.

What is the correct command that solves the problem? Note that I do not want the outputs of the -printf option to be shown/printed outside of the Gawk context: I only want to pass them to Gawk so that it is only the Gawk program that knows how to use them. If the Gawk program does not use them at all, they must not be shown anywhere.

Since the command will be used for many files, maximization of performance and minimization of memory consumption are important.


Solution

  • Using any awk:

    find 'd/ir' -name 'file*.txt' -type f -printf '%s %p\n' |
    awk '
        {
            size = $1
            sub(/[^ ]+ /,"")
            file = $0
            while ( (getline line < file) > 0 ) {
                print line ";" file "," size
            }
            close(file)
        }
    '
    A;d/ir/file1.txt,5
    BC;d/ir/file1.txt,5
    D;d/ir/file2.txt,9
    EF;d/ir/file2.txt,9
    GHI;d/ir/file2.txt,9
    

    Set LC_ALL to whatever you need, it doesn't impact the logic. See http://awk.freeshell.org/AllAboutGetline for more info on using getline.