bashawktextunix-text-processing

Processing text with multiple delims in awk


I have a text which looks like -

Application.||dates:[2022-11-12]|models:[MODEL1]|count:1|ids:2320 
Application.||dates:[2022-11-12]|models:[MODEL1]|count:5|ids:2320 

I want the number from the count:1 columns so 1 and i wish to store these numbers in an array.

nums=($(echo -n "$grepResult" | awk -F ':' '{ print $4 }' | awk -F '|' '{ print $1 }'))

this seems very repetitive and not very efficient, any ideas how to simplify this ?


Solution

  • You can use awk once, set the field separator to |. Then loop all the fields and split on :

    If the field starts with count then print the second part of the splitted value.

    This way the count: part can occur anywhere in the string and can possibly print this multiple times.

    nums=($(echo -n "$grepResult" |  awk -F'|' '
    {
      for(i=1; i<=NF; i++) {
        split($i, a, ":")
        if (a[1] == "count") {
          print a[2]
        }
      }
    }
    '))
    
    for i in "${nums[@]}"
    do
       echo "$i"
    done
    

    Output

    1
    5
    

    If you want to combine the both split values, you can use [|:] as a character class and print field number 8 for a precise match as mentioned in the comments.

    Note that it does not check if it starts with count:

     nums=($(echo -n "$grepResult" |  awk -F '[|:]' '{print $8}'))
    

    With gnu awk you can use a capture group to get a bit more precise match where on the left and right can be either the start/end of string or a pipe char. The 2nd group matches 1 or more digits:

    nums=($(echo -n "$grepResult" | awk 'match($0, /(^|\|)count:([0-9]+)(\||$)/, a) {print a[2]}' ))