phpregexplaceholder

Split string before and after double-braced placeholders


I'm trying to split a string into an array of parts.

String Example...

The quick brown fox [[random text here]] and then [[a different text here]]

Text between the square brackets will change and cannot be determined ahead of time. The preg_split I have so far will split, but it places the delimiters in other elements in the produced array, not the element I want it to be in.

$page_widget_split = preg_split('@(?<=\[\[)(.*?)(?=\]\])@', $page_content,-1, PREG_SPLIT_DELIM_CAPTURE);

This produces something like this...

[0] => "The quick brown fox [[",
[1] => "random text here]]",
[2] => " and then [[",
[3] => "a different text here]]"

The desired result would look like this...

[0] => "The quick brown fox",
[1] => "[[random text here]]",
[2] => " and then ",
[3] => "[[a different text here]]"

As I'm far from understanding Regex, could someone please take a look and tell me what I'm missing in the regex ?


Solution

  • This will get you pretty close

     $page_content = 'the quick brown fox [[random text here]] and then [[a different text here]]';
    
     print_r(preg_split('/(\[\[[^\]]+\]\])/', $page_content, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY));
    

    The thing to remember is that this is the delimiter (\[\[[^\]]+\]\])

    Output:

    Array
    (
        [0] => the quick brown fox 
        [1] => [[random text here]]
        [2] =>  and then 
        [3] => [[a different text here]]
    )
    

    Sandbox

    When i say pretty close, I do mean really pretty close...

    The regex is pretty straight forward, capture 2 [ then anything but a ] then 2 of those ]. Which makes our delimiter, which we then capture. No empty flag is nice too.

    Enjoy!

    UPDATE

    but it fails on " here is my table [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]] and this is more text"...Note the "[]" under the 'columns'

    To handle that you will need a recursive regex pattern using (?R), like this:

    $page_content = 'here is my table [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]] and this is more text [someother bracket]';
    
    print_r(preg_split('/(\[(?:[^\[\]]|(?R))*\])/', $page_content, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY));
    

    Output:

    Array
    (
        [0] => here is my table 
        [1] => [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]]
        [2] =>  and this is more text 
        [3] => [someother bracket] //single bracket capture
    )
    

    Sandbox

    I won't pretend, this is kind of at the edge of my knowledge of regex, I should note this matches single brackets and not specifically double ones. You could try something like this /(\[(\[(?:[^\[\]]|(?2))*\])\])/ the (?2) is like (?R) but for a specific capture group. Which this works to match only [[ ... ]] while keeping the inner nesting. But the issue is, then you have the capture duplicated, so you wind up with this:

    Array
    (
        [0] => here is my table 
        [1] => [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]]
        [2] => [{"widget":"table","id":"1","title": "Views Table", "columns": []}]
        [3] =>  and this is more text [someother bracket]
    )
    

    Notice how it doesn't capture [someother bracket], but it captures the other one 2 times. There may be a way around that, but i can't think of it.

    Rather or not capturing single bracket pairs is an issue I don't know.

    But I have used this before, mainly for matching, matched pairs of " or ( ) but it's the same concept.

    The only other solution would be to make a lexer/parser for it, I have some examples of how do do that on my GitHub account. Regex (by itself) is not suited to nested elements. Most any regex solution will fail on nesting.