phpregexpreg-matchmatch

Php mb_ereg_match faulty match


I am trying to match some text with mb_ereg_match of php and I am using this piece of regex to match all non Word chats:

/[^-\w.]|[_]/u

I want to be able to look up unicode chars that's why I am using mb_ereg. With this input:

'γιωρ;γος.gr'

Which containes chars from Greek alphabet. I want to match the ';' and if it is matched to return -1 else return the input.
Whatever I do it doesn't match the ';' and returns the input.
I tried to use preg_match but it doesn't work as I work.
Any suggestions?

Edit 1
I did a test and I found that it matches corectly if I convert my input to:

';γος.gr'

Also works fine with latin chars.

Edit 2
If I get one of the following I want to print -1.

'γιωρ;γος.gr'
';γος.gr'
'γιωρ;.gr'
';.gr'

Else I want to get whatever the input is.

Edit 3
I did some more tests and it doesn't match any special char that is surounded of utf-8 chars.


Solution

  • You need to use \X with preg_match_all to match all Unicode chars:

    \X
    - an extended Unicode sequence

    Also, see this \X description from Regular-Expression.info:

    Matching a single grapheme, whether it's encoded as a single code point, or as multiple code points using combining marks, is easy in Perl, PCRE, PHP, and Ruby 2.0: simply use \X. You can consider \X the Unicode version of the dot. There is one difference, though: \X always matches line break characters, whereas the dot does not match line break characters unless you enable the dot matches newline matching mode.

    And you can use the following snippet then:

    $re = '/\X/u'; 
    $str = "γιωρ;γος.gr"; 
    preg_match_all($re, $str, $matches);
    if (in_array(";", $matches[0])) {
        echo -1;
    }
    else {
          print_r($matches[0]);
    }
    

    See IDEONE demo