linuxawk

Capture fields into an array using spaces except for strings using AWK


I am trying to populate an array in Bash with WSL2. I have an input text file that I'm trying to parse using awk. I want to extract the information below YOLOargs into an array separated by spaces, except for the file path in quotes, which I want to extract in its entirety.

Here is the example text file (argstext):

Pipelines:
YOLO-Goby

Pipeline Container Arguments:
Bubblerargs:
python Collect_Unpack.py --primary_images --processes 1 --every_nth 25 --output_folder

YOLOargs:
python 03_YOLO_infer_no_labels.py --img_list_txt --output_name inference_output_test --confidence 0.1 --weights "/mnt/c/Users/jmilitello/OneDrive - DOI/ARIS_MLM_Files/Literature/Literature for Pete/Code/best.pt"

So far I have just used the following just to extract the information below YOLOargs:

YOLOcontainerargs=($(awk -F' ' 'c&&c--;/.*YOLOargs:/{c=1}' "${argstext}")) 

This will create an array, but the elements for the filepath are separated at all the spaces instead of being one element.


Solution

  • If you're trying to populate a bash array then this might be what you want, using GNU awk for FPAT:

    $ cat tst.sh
    #!/usr/bin/env bash
    
    readarray -t YOLOcontainerargs < <(
        awk -v FPAT='[^[:space:]]+|("([^"]|"")*")' -v OFS='\n' '
            f { $1=$1; print; exit }
            /^YOLOargs:/ { f = 1 }
        ' "${@:--}"
    )
    
    declare -p YOLOcontainerargs
    

    $ ./tst.sh argstext
    declare -a YOLOcontainerargs=([0]="python" [1]="03_YOLO_infer_no_labels.py" [2]="--img_list_txt" [3]="--output_name" [4]="inference_output_test" [5]="--confidence" [6]="0.1" [7]="--weights" [8]="\"/mnt/c/Users/jmilitello/OneDrive - DOI/ARIS_MLM_Files/Literature/Literature for Pete/Code/best.pt\"")
    

    See also the answers to a very similar question posted earlier today - Convert an array stored in a string variable back to an array.