awkunicodeasciitext-manipulation

How to I set a symbol inside the record separator of awk


How do I include symbols into the record separator of awk. I know the basic syntax like this:

awk 'BEGIN{RS="[:.!]"}{if (tolower($0) ~ "$" ) print $0 }'

which will separate a single line into separate records based on ! . and : but I also want to include symbols like a green checkmark this . I am having trouble understanding the syntax, so I put it in like this

awk 'BEGIN{RS="[:.!\u2705]"}{if (tolower($0) ~ "$" ) print $0 }'

which doesnt seem to work.

Sample input is this:

✅  Team collaboration  ✅  Project organisation✅  SSO support✅  API Access✅  Priority Support 

Solution

  • You need to use a regex with an alternation operator (|) because the character you want to split with consists of three separate UTF8 code units: E2, 9C and 85.

    You can use

    awk 'BEGIN{RS="[:.!]|\xE2\x9C\x85"} tolower($0) ~ "$"'
    

    See the online demo:

    #!/bin/bash
    s='✅ Team collaboration ✅ Project organisation✅ SSO support✅ API Access✅ Priority Support'
    awk 'BEGIN{RS="[:.!]|\xE2\x9C\x85"} tolower($0) ~ "$"' <<< "$s"
    

    Output:

    
     Team collaboration 
     Project organisation
     SSO support
     API Access
     Priority Support
    

    Note that print $0 is a default action, no need to use it explicitly.