phpregexpreg-split

Split a string on spaces and/or commas after a word which may be preceded by a small word


Here is my string:

$myString = "first second third,el forth, fiveeee, six";

What I want to capture is:

first
second
third
el forth
fiveeee
six

This is what I tried for regex to be used in preg_split:

 $myPattern = "[\s,]";

The problem is this captures "el" and "forth" seperatly..

How can I trick this into capturing el forth?

Edit:

I was not clear.. I want to capture el forth as a single array element.. Because EL is too short.. I think it as a single word. Like:

EL CLASSICO,SOMETHING DIFFERENT,SOMETHINGELSEHERE SOMEMORETEXT should be:

* `EL CLASSICO`
* `SOMETHING DIFFERENT`
* `SOMETHINGELSEHERE`
* `SOMEMORETEXT`

They should be seperated by spaces OR commas but if there is something like EL or LE, that should be ignored.


Solution

  • <?php
    $myString = "first second third,el forth,del fiveeee,six,six seven,four six";
    $myPattern = "/\s*,\s*|(?<=[^\s,]{4})[\s,]+/";
    
    print_r(preg_split($myPattern, $myString));
    ?>
    

    produces

    [0] => first
    [1] => second
    [2] => third
    [3] => el forth
    [4] => del fiveeee
    [5] => six
    [6] => six seven
    [7] => four
    [8] => six
    

    (?<=[^\s,]{4}) is a positive look-behind assertion. It is only successful if preceded by four non-separator characters (but it does not match the characters themselves, it only checks that they exist). This allows it not to split if the previous word was too short. But it will always split if the separator includes a comma -- that's what \s*,\s*| is for.