phphtmlfilterdata-scrubbing

Removing characters from a PHP String


I'm accepting a string from a feed for display on the screen that may or may not include some rubbish I want to filter out. I don't want to filter normal symbols at all.

The values I want to remove look like this: �

It is only this that I want removed. Relevant technology is PHP.

Suggestions appreciated.


Solution

  • Thanks for the responses, guys. Unfortunately, those submitted had the following problems:

    wrong for obvious reasons:

    ereg_replace("[^A-Za-z0-9]", "", $string);
    

    This:

    s/[\u00FF-\uFFFF]//
    

    which also uses the deprecated ereg form of regex also didn't work when I converted to preg because the range was simply too large for the regex to handle. Also, there are holes in that range that would allow rubbish to seep through.

    This suggestion:

    This is an encoding problem; you shouldn't try to clean that bogus characters but understand why you're receiving them scrambled.

    while valid, is no good because I don't have any control over how the data I receive is encoded. It comes from an external source. Sometimes there's garbage in there and sometimes there is not.

    So, the solution I came up with was relatively dirty, but in the absence of something more robust I'm just accepting all standard letters, numbers and symbols and discarding the rest.

    This does seem to work for now. The solution is as follows:

    $fixT = str_replace("£", "£", $string); 
    $fixT = str_replace("€", "€", $fixT);
    $fixT = preg_replace("/[^a-zA-Z0-9\s\.\/:!\[\]\*\+\-\|\<\>@#\$%\^&\(\)_=\';,'\?\\\{\}`~\"]/", "", $fixT);
    

    If anyone has any better ideas I'm still keen to hear them. Cheers.