phptokenize

PHP Tokens From a String


Let's say you have a string that looks like this: token1 token2 tok3

And you want to get all of the tokens (specifically the strings between the spaces), AND ALSO their position (offset) and length).

So I would want a result that looks something like this:

array(
    array(
        'value'=>'token1'
        'offset'=>0
        'length'=>6
    ),
    array(
        'value'=>'token2'
        'offset'=>7
        'length'=>6
    ),
    array(
        'value'=>'tok3'
        'offset'=>14
        'length'=>4
    ),
)

I know that this can be done by simply looping through the characters of the string and I can simply write a function to do this.

I am wondering, does PHP have anything built-in that will do this efficiently or at least help with part of this?

I am looking for suggestions and appreciate any help given. Thanks


Solution

  • You can use preg_match_all with the PREG_OFFSET_CAPTURE flag:

    $str = 'token1 token2 tok3';
    preg_match_all('/\S+/', $str, $matches, PREG_OFFSET_CAPTURE);
    var_dump($matches);
    

    Then you just need to replace the items in $matches[0] like this:

    function update($match) {
        return array( 'value' => $value[0], 'offset' => $value[1], 'length' => strlen($value[0]));
    }   
    array_map('update', $matches[0]);
    var_dump($matches[0]);