I've got following string (example):
Loader[data-prop data-attr="value"]
There can be 1 - n attributes. I want to extract every attribute. (data-prop
,data-attr="value"
). I tried it in many different ways, for example with \[(?:(\S+)\s)*\]
but I didn't get it right. The expression should be written in PREG style.
I suggest grabbing all the key-value pairs with a regex:
'~(?:([^][]*)\b\[|(?!^)\G)\s*(\w+(?:-\w+)*(?:=(["\'])?[^\]]*?\3)?)~'
(see regex demo) and then
See IDEONE demo
$re = '~(?:([^][]*)\b\[|(?!^)\G)\s*(\w+(?:-\w+)*(?:=(["\'])?[^\]]*?\3)?)~';
$str = "Loader[data-prop data-attr=\"value\" more-here='data' and-one-more=\"\"]";
preg_match_all($re, $str, $matches);
$arr = array();
for ($i = 0; $i < count($matches); $i++) {
if ($i != 0) {
$arr = array_merge(array_filter($matches[$i]),$arr);
}
}
print_r(preg_grep('~\A(?![\'"]\z)~', $arr));
Output:
Array
(
[3] => data-prop
[4] => data-attr="value"
[5] => more-here='data'
[6] => and-one-more=""
[7] => Loader
)
Notes on the regex (it only looks too complex):
(?:([^][]*)\b\[|(?!^)\G)
- a boundary: we only start matching at a [
that is preceded with a word (a-zA-Z0-9_
) character (with \b\[
), or right after a successful match (with (?!^)\G
). Also, ([^][]*)
will capture into Group 1 the part before the [
.\s*
- matches zero or more whitespace symbols(\w+(?:-\w+)*)
- captures into Group 2 "words" like "word1" or "word1-word2"..."word1-wordn"(?:=(["\'])?[^\]]*?\3)?
- optional group (due to (?:...)?
) matching
=
- an equal sign(["\'])?
- Group 3 (auxiliary group to check for the value delimiter) capturing either "
, '
or nothing[^\]]*?
- (value) zero or more characters other than ]
as few as possible\3
- the closing '
or "
(the same value captured in Group 3).Since we cannot get rid of capturing '
or "
, we can preg_grep
all the elements that we are not interested in with preg_grep('~\A(?![\'"]\z)~', $arr)
where \A(?![\'"]\z)
matches any string that is not equal to '
or "
.