bashshellubuntugnu-coreutils

How to split a string on a multi-character delimiter in Bash?


Why doesn't the following Bash code work?

for i in $( echo "emmbbmmaaddsb" | split -t "mm"  )
do
    echo "$i"
done

Expected output:

e
bb
aaddsb

Solution

  • Since you're expecting newlines, you can simply replace all instances of mm in your string with a newline. In pure native bash:

    in='emmbbmmaaddsb'
    sep='mm'
    printf '%s\n' "${in//$sep/$'\n'}"
    

    If you wanted to do such a replacement on a longer input stream, you might be better off using awk, as bash's built-in string manipulation doesn't scale well to more than a few kilobytes of content. The gsub_literal shell function (backending into awk) given in BashFAQ #21 is applicable:

    # Taken from http://mywiki.wooledge.org/BashFAQ/021
    
    # usage: gsub_literal STR REP
    # replaces all instances of STR with REP. reads from stdin and writes to stdout.
    gsub_literal() {
      # STR cannot be empty
      [[ $1 ]] || return
    
      # string manip needed to escape '\'s, so awk doesn't expand '\n' and such
      awk -v str="${1//\\/\\\\}" -v rep="${2//\\/\\\\}" '
        # get the length of the search string
        BEGIN {
          len = length(str);
        }
    
        {
          # empty the output string
          out = "";
    
          # continue looping while the search string is in the line
          while (i = index($0, str)) {
            # append everything up to the search string, and the replacement string
            out = out substr($0, 1, i-1) rep;
    
            # remove everything up to and including the first instance of the
            # search string from the line
            $0 = substr($0, i + len);
          }
    
          # append whatever is left
          out = out $0;
    
          print out;
        }
      '
    }
    

    ...used, in this context, as:

    gsub_literal "mm" $'\n' <your-input-file.txt >your-output-file.txt