phpregexquantifiers

Regular expression with [0-9]:[0-9] is not matching H:i formatted time string


After hours of trying to wrap my head around pattern matching, I'm throwing the towel in and turning to the experts...

I have a log file that I'm trying to extract a string from.

The format is like:

12:00 SomeText:
1:20 MoreText:

The "SomeText/MoreText" is what I need to get. I've come up with the code below but not getting anything near the results I'm expecting:

$string = "12:00 SomeText: blah, blah, blah not important";
$regex = '/[0-9]:[0-9] (.*?)\: /';
$entity = preg_split($regex, $string);

The regex logic as I understand it is, any number, followed by a colon, followed by any number, followed by white space, the text, followed be a colon, followed by white space.


Solution

  • You are matching a single digit, a colon, then a single digit.

    $string = "12:00 SomeText: blah, blah, blah not important";
    $regex = '/[0-9]+:[0-9]+ ([^:]+)/';
    $entity = array();
    preg_match($regex, $string, $entity);
    

    This will match one or more digits, a colon, one or more digits, the rest. preg_match will put the entire matching expression (12:00 Some Text) in position 0, and the matched subexpressions (Some Text) after that, so your "Some Text" will be in $entity[1]

    [Edit] After the discussion in the comments, I've improved the matching against the header. Before, you had

    (.*?)\:
    

    which will find any character, up to the end of the string, optionally, and then backtrack until it finds the colon. I've replaced it with

    ([^:]+)
    

    which will find one or more characters that are not colons, and match them. This saves the regex from looking at "blah, blah, blah..." and then ignoring what it just found.