perl

Perl replace multiple strings simultaneously (case insensitive)


Consider the following perl code which works perfectly:

%replacements = ("what" => "its", "lovely" => "bad");
($val = $sentence) =~ s/(@{[join "|", keys %replacements]})/$replacements{$1}/g;

stackoverflow user sresevoir brilliantly came up with that replacement code that involved using a hash, allowing you to find and replace multiple terms without iterating through a loop.

I've been throwing other various search and replace terms at it programmatically and I've started using it to highlight words that are the result of a search.

The problem (refer to problem code shown below):

Make it case insensitive by adding an "i" before the "g" at the end. If the search term $thisterm and the search term word contained in $sentence has no difference in case, there are no problems. If the search term $thisterm (i.e. Stackoverflow) and the search term word contained in $sentence is a different case (i.e. stackoverflow), then the result returned is nothing for that term. It's as if I told it to

$sentence =~ s/$thisterm//g;

Here's the problem code:

foreach $thisterm (@searchtermarray) {

# The variable $thisterm has already gone through a filter to remove special characters.

$thistermtochange = $thisterm;

$replacements{$thistermtochange} = "<span style=\"background-color:#FFFFCC;\">$thistermtochange<\/span>";

}

$sentence =~ s/(@{[join "|", keys %replacements]})/$replacements{$1}/ig;

I also went back and duplicated the problem with the above original code. It seems the combination of adding the i modifier, using a hash reference, and different case is something Perl doesn't like.

What am I missing?


Solution

  • Keep all the keys of the hash in lower case, and do this:

    s/(@{[join "|", keys %replacements]})/$replacements{ lc $1 }/ig
    

    (note the addition of lc)

    There are a few other things you ought to consider.

    First, as is, if you are trying to replace both lovely and love with different replacements, lovely may or may not ever be found, depending on which key is returned by keys first. To prevent this, it's a good idea to sort by descending length:

    s/(@{[join "|", sort { length $b <=> length $a } keys %replacements]})/$replacements{$1}/ig
    

    Second, this technique only works with fixed strings; if your keys contain any regex metacharacters, for instance replacing how? with why?, it will fail, because $1 will never be how?. To allow metacharacters (interpreted as literal characters), quote them:

    s/(@{[join "|", map quotemeta, sort { length $b <=> length $a } keys %replacements]})/$replacements{$1}/ig
    

    From your comment, it seems to me that you want to find certain strings, all in one pass, and add stuff around them (that doesn't vary by which string). If so, you are going about it the hard way and shouldn't be using a hash at all. Have an array of the strings you want to search for and replace them:

     s/(@{[join "|", map quotemeta, sort { length $b <=> length $a } @search_strings]})/<span style="background-color:#FFFFCC;">$1<\/span>/ig;