phpregexpreg-matchplaceholdertext-extraction

Parse square braced placeholder and extract the dynamic number of "data-" attribute declarations individually


I've got following string (example):

Loader[data-prop data-attr="value"]

There can be 1 - n attributes. I want to extract every attribute. (data-prop,data-attr="value"). I tried it in many different ways, for example with \[(?:(\S+)\s)*\] but I didn't get it right. The expression should be written in PREG style.


Solution

  • I suggest grabbing all the key-value pairs with a regex:

    '~(?:([^][]*)\b\[|(?!^)\G)\s*(\w+(?:-\w+)*(?:=(["\'])?[^\]]*?\3)?)~'
    

    (see regex demo) and then

    See IDEONE demo

    $re = '~(?:([^][]*)\b\[|(?!^)\G)\s*(\w+(?:-\w+)*(?:=(["\'])?[^\]]*?\3)?)~'; 
    $str = "Loader[data-prop data-attr=\"value\" more-here='data' and-one-more=\"\"]"; 
    preg_match_all($re, $str, $matches);
    $arr = array();
    for ($i = 0; $i < count($matches); $i++) {
        if ($i != 0) {
            $arr = array_merge(array_filter($matches[$i]),$arr);
        }
    }
    print_r(preg_grep('~\A(?![\'"]\z)~', $arr));
    

    Output:

    Array
    (
        [3] => data-prop
        [4] => data-attr="value"
        [5] => more-here='data'
        [6] => and-one-more=""
        [7] => Loader
    )
    

    Notes on the regex (it only looks too complex):

    Since we cannot get rid of capturing ' or ", we can preg_grep all the elements that we are not interested in with preg_grep('~\A(?![\'"]\z)~', $arr) where \A(?![\'"]\z) matches any string that is not equal to ' or ".