phpstr-replacefindandmodify

str_replace() not working for the following case


I would like to use str_replace() to place span elements around html strings for the purpose of highlighting them.

However the following does not work when there is   inside the string. I've tried replacing the   with ' ' but this did not help.


LIVE example

You can recreate the problem using the below code:

$str_to_replace = "as a way to incentivize more purchases.";

$replacement = "<span class='highlighter'>as a way to incentivize&nbsp;more purchases.</span>";

$subject = file_get_contents("http://venturebeat.com/2015/11/10/sources-classpass-raises-30-million-from-google-ventures-and-others/");

$output = str_replace($str_to_replace,$replacement,$subject);

.highlighter{
    background-collor: yellow;
}

Solution

  • So I tried your code and ran into the same problem you did. Interesting, right? The problem is that there's actually another character inbetween the "e" in "incentivize" and the " more", you can see it if you do this, split $subject into two parts, preceding the text to incentivize and after:

    // splits the webpage into two parts
    $x = explode('to incentivize', $subject);
    
    // print the char code for the first character of the second string
    // (the character right after the second e in incentivize) and also
    // print the rest of the webpage following this mystery character
    exit("keycode of invisible character: " . ord($x[1]) . " " . $x[1]);
    

    which prints: keycode of invisible character: 194 Â more ..., look! There's our mystery character, and it has charcode 194!

    Perhaps this website embeds these characters to make it difficult to do exactly what you're doing, or perhaps it's just a bug. In any case, you can use preg_replace instead of str_replace and change $str_to_replace like so:

    $str_to_replace = "/as a way to incentivize(.*?)more purchases/";
    
    $replacement = "<span class='highlighter'>as a way to incentivize more purchases.</span>";
    
    $subject = file_get_contents("http://venturebeat.com/2015/11/10/sources-classpass-raises-30-million-from-google-ventures-and-others/");
    
    $output = preg_replace($str_to_replace,$replacement,$subject);
    

    and now this does what you want. The (.*?) handles the mysterious hidden character. You can probably shrink this regular expression even further or at least cap it at a maximum amount of characters ([.]{0,5}) but in either case you likely want to stay flexible.