seddelimitercut

How to use cut with multiple character delimiter in Unix?


My file looks like this

abc ||| xyz ||| foo bar
hello world ||| spam ham jam ||| blah blah

I want to extract a specific column, e.g. I could have done:

sed 's/\s|||\s/\\t/g' file | cut -f1

But is there another way of doing that?


Solution

  • Since | is a valid regex expression, it needs to be escaped with \\| or put in square brackets: [|].

    You can do this:

    awk -F' \\|\\|\\| ' '{print $1}' file
    

    Some other variations that work as well:

    awk -F' [|][|][|] ' '{print "$1"}' file
    awk -F' [|]{3} ' '{print "$1"}' file
    awk -F' \\|{3} ' '{print "$1"}' file
    awk -F' \\|+ ' '{print "$1"}' file
    awk -F' [|]+ ' '{print "$1"}' file
    

    \ as separator does not work well in square brackets, only escaping, and many escape chars :)

    cat file
    abc \\\ xyz \\\ foo bar
    

    Example: 4 \ for every \ in the expression, so 12 \ in total.

    awk -F' \\\\\\\\\\\\ ' '{print $2}' file
    xyz
    

    or

    awk -F' \\\\{3} ' '{print $2}' file
    xyz
    

    or this but it's not much simpler

    awk -F' [\\\\]{3} ' '{print $2}' file
    xyz
    
    awk -F' [\\\\][\\\\][\\\\] ' '{print $2}' file
    xyz