shellperl

Replacing repeated arbitary characters with reg expression


Original data contains 4 times repeated characters, seperated by a space. For example,

code2 1 1 1 1 7 7 7 7 10 10 10 10 eq
code9 a a a a tpp1 tpp1 tpp1 tpp1 es

I'd like to add suffix to pairs using perl or linux shell scripting, but have difficulties to catch pairs correctly.

Ideal results are,

code2 1[1] 1[2] 1[3] 1[4] 7[1] 7[2] 7[3] 7[4] 10[1] 10[2] 10[3] 10[4] eq
code9 a[1] a[2] a[3] a[4] tpp1[1] tpp1[2] tpp1[3] tpp1[4] es

Could you sugguest implementation ideas or some reg expression for this case?


Solution

  • A two-pass approach is necessary. I'd use something like this:

    s{
       (?: ^ | \s )
       \K
       ( \S+ )
       (?: \s+ \1 ){3}
       (?= \s | $ )
    }{
       my $i = 0;
       $& =~ s/\S+\K/ "[".(++$i)."]" /ger   
    }xge;
    

    Demo:

    {
       echo 'code2 1 1 1 1 7 7 7 7 10 10 10 10 eq'
       echo 'code9 a a a a tpp1 tpp1 tpp1 tpp1 es'
    } |
    perl -pe'
       s{
          (?: ^ | \s )
          \K
          ( \S+ )
          (?: \s+ \1 ){3}
          (?= \s | $ )
       }{
          my $i = 0;
          $& =~ s/\S+\K/ "[".(++$i)."]" /ger   
       }xge
    '
    code2 1[1] 1[2] 1[3] 1[4] 7[1] 7[2] 7[3] 7[4] 10[1] 10[2] 10[3] 10[4] eq
    code9 a[1] a[2] a[3] a[4] tpp1[1] tpp1[2] tpp1[3] tpp1[4] es
    

    The program can be squished into a single line if you so desire.

    s{(?:^|\s)\K(\S+)(?:\s+\1){3}(?=\s|$)}{$i=0;$&=~s/\S+\K/"[".(++$i)."]"/ger}ge