I'm trying to split a string into an array of parts.
String Example...
The quick brown fox [[random text here]] and then [[a different text here]]
Text between the square brackets will change and cannot be determined ahead of time. The preg_split I have so far will split, but it places the delimiters in other elements in the produced array, not the element I want it to be in.
$page_widget_split = preg_split('@(?<=\[\[)(.*?)(?=\]\])@', $page_content,-1, PREG_SPLIT_DELIM_CAPTURE);
This produces something like this...
[0] => "The quick brown fox [[",
[1] => "random text here]]",
[2] => " and then [[",
[3] => "a different text here]]"
The desired result would look like this...
[0] => "The quick brown fox",
[1] => "[[random text here]]",
[2] => " and then ",
[3] => "[[a different text here]]"
As I'm far from understanding Regex, could someone please take a look and tell me what I'm missing in the regex ?
This will get you pretty close
$page_content = 'the quick brown fox [[random text here]] and then [[a different text here]]';
print_r(preg_split('/(\[\[[^\]]+\]\])/', $page_content, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY));
The thing to remember is that this is the delimiter (\[\[[^\]]+\]\])
Output:
Array
(
[0] => the quick brown fox
[1] => [[random text here]]
[2] => and then
[3] => [[a different text here]]
)
When i say pretty close
, I do mean really pretty close...
The regex is pretty straight forward, capture 2 [
then anything but a ]
then 2 of those ]
. Which makes our delimiter, which we then capture. No empty flag is nice too.
Enjoy!
UPDATE
but it fails on " here is my table [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]] and this is more text"...Note the "[]" under the 'columns'
To handle that you will need a recursive regex pattern using (?R)
, like this:
$page_content = 'here is my table [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]] and this is more text [someother bracket]';
print_r(preg_split('/(\[(?:[^\[\]]|(?R))*\])/', $page_content, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY));
Output:
Array
(
[0] => here is my table
[1] => [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]]
[2] => and this is more text
[3] => [someother bracket] //single bracket capture
)
I won't pretend, this is kind of at the edge of my knowledge of regex, I should note this matches single brackets and not specifically double ones. You could try something like this /(\[(\[(?:[^\[\]]|(?2))*\])\])/
the (?2)
is like (?R)
but for a specific capture group. Which this works to match only [[ ... ]]
while keeping the inner nesting. But the issue is, then you have the capture duplicated, so you wind up with this:
Array
(
[0] => here is my table
[1] => [[{"widget":"table","id":"1","title": "Views Table", "columns": []}]]
[2] => [{"widget":"table","id":"1","title": "Views Table", "columns": []}]
[3] => and this is more text [someother bracket]
)
Notice how it doesn't capture [someother bracket]
, but it captures the other one 2 times. There may be a way around that, but i can't think of it.
Rather or not capturing single bracket pairs is an issue I don't know.
But I have used this before, mainly for matching, matched pairs of "
or ( )
but it's the same concept.
The only other solution would be to make a lexer/parser for it, I have some examples of how do do that on my GitHub account. Regex (by itself) is not suited to nested elements. Most any regex solution will fail on nesting.