phpregexstringsanitizationdelimited

Sanitize and standardize a string that contains an indeterminate sequence of delimiting characters and whitespaces


I have a php variable that comes from a form that needs tidying up.

The variable contains a list of items (possibly two or three word items with a space in between words).

I want to convert it to a comma separated list with no superfluous white space. I want the divisions to fall only at commas, semi-colons or new-lines. Blank cannot be an item.

Here's a comprehensive example (with a deliberately messy input):

Input string:

$input = 'dog, cat         ,car,tea pot,,  ,,, ;;
fly,     cake';

Desired result string:

dog,cat,car,tea pot,fly,cake

Solution

  • You can start by splitting the string into "useful" parts, with preg_split, and, then, implode those parts back together :

    $str_in = "dog, cat         ,car,tea pot,,  ,,, ;;
    fly,     cake";
    
    $parts = preg_split('/[,;\s]/', $str_in, -1, PREG_SPLIT_NO_EMPTY);
    
    $str_out = implode(',', $parts);
    
    var_dump($parts, $str_out);
    

    (Here, the regex will split on ',', ';', and '\s', which means any whitespace character -- and we only keep non-empty parts)

    Will get you, for $parts :

    array
      0 => string 'dog' (length=3)
      1 => string 'cat' (length=3)
      2 => string 'car' (length=3)
      3 => string 'tea' (length=3)
      4 => string 'pot' (length=3)
      5 => string 'fly' (length=3)
      6 => string 'cake' (length=4)
    

    And, for $str_out :

    string 'dog,cat,car,tea,pot,fly,cake' (length=28)
    



    Edit after the comment : sorry, I didn't notice that one ^^

    In that case, you can't split by white-space :-( I would probably split by ',' or ';', iterate over the parts, using trim to remove white-characters at the beginning and end of each item, and only keep those that are not empty :

    $useful_parts = array();
    $parts = preg_split('/[,;]/', $str_in, -1, PREG_SPLIT_NO_EMPTY);
    foreach ($parts as $part) {
        $part = trim($part);
        if (!empty($part)) {
            $useful_parts[] = $part;
        }
    }
    var_dump($useful_parts);
    


    Executing this portion of code gets me :

    array
      0 => string 'dog' (length=3)
      1 => string 'cat' (length=3)
      2 => string 'car' (length=3)
      3 => string 'tea pot' (length=7)
      4 => string 'fly' (length=3)
      5 => string 'cake' (length=4)
    


    And imploding all together, I get, this time :

    string 'dog,cat,car,tea pot,fly,cake' (length=28)
    

    Which is better ;-)