bashshellcsv

extract strings between square brackets and arrange them as csv


I need to remove the brackets around the strings from multiple lines of output and print them as csv.

This is what I tried.

#!/bin/bash

input_string="Group [BO_analytics_Corp] in security domain [Native]: [+r+w].
Group [Tooling_Ops] in security domain [Met_LDAP]: [+r+w].
Group [Gov_tools] in security domain [Met_LDAP]: [+r+w+g]."

echo $(echo $input_string | grep -oP '(?<=\[).*?(?=\])' | tr '\n' ',')

This is what I am expecting.

BO_analytics_Corp,Native,+r+w
Tooling_Ops,Met_LDAP,+r+w
Gov_tools,Met_LDAP,+r+w+g

But this is what I am getting when I run the script.

BO_analytics_Corp,Native,+r+w,Tooling_Ops,Met_LDAP,+r+w,Gov_tools,Met_LDAP,+r+w+g,

Solution

  • Here's a very simple approach using awk. It uses brackets as field delimiter, and prints the second, fourth, and sixth column.

    awk -F '[][]' -v OFS=, '{print $2,$4,$6}' <<< "$input_string"
    

    If the number of fields is unknown, here's an alternative, yet still simple approach using GNU Awk's FPAT with a regex defining a field's pattern, and a loop to iterate over all fields:

    gawk -v FPAT='\\[[^]]*\\]' -v OFS=, '{
      for (i=1; i<=NF; i++) $i = substr($i, 2, length($i)-2); print;
    }' <<< "$input_string"
    

    Output:

    BO_analytics_Corp,Native,+r+w
    Tooling_Ops,Met_LDAP,+r+w
    Gov_tools,Met_LDAP,+r+w+g