phpregexpreg-matchtext-extractionsanitization

Get sequences of numeric values which have a distinctive starting format as a sanitized array of delimited strings


The given string is "38min *08min 38 *08min 36 *01min *26 50 *15min *41 *11min *41"

I'm trying to parse this string in PHP so that I get something like:

arr[0] = "38"
arr[1] = "08,38"
arr[2] = "08, 36"
arr[3] = "01, 26, 50"
arr[4] = "15, 41"
arr[5] = "11, 41"

Solution

  • Since the * delimiters seemingly are incoherent, I would use an awfully complex regex for that:

    preg_match_all('#
            (\d+)min[\s*]+
            (?:
                (\d+)(?!min|\d)
                    (?:
                       [\s*]+(\d+)(?!min|\d)
                    )?
            )?#x',
            $string, $matches, PREG_SET_ORDER);
    print_r($matches);
    

    Gives you:

    Array
    (
        [0] => Array
            (
                [0] => 38min *
                [1] => 38
            )
    
        [1] => Array
            (
                [0] => 08min 38
                [1] => 08
                [2] => 38
            )
    
        [2] => Array
            (
                [0] => 08min 36
                [1] => 08
                [2] => 36
            )
    
        [3] => Array
            (
                [0] => 01min *26 50
                [1] => 01
                [2] => 26
                [3] => 50
            )
    
        [4] => Array
            (
                [0] => 15min *41
                [1] => 15
                [2] => 41
            )
    
        [5] => Array
            (
                [0] => 11min *41
                [1] => 11
                [2] => 41
            )
    
    )
    

    You'll have to reassemble the [1] and [2] and [3] entries for your wanted strings.