regexopenrefine

regex: get everything that doesn't fit in hooks


i try to get everything that doesn't fit in hooks with regex in OpenRefine but i'm stuck.

i have done this :

/^([a-z]+)\[[a-z]+\]([a-z]+)/

but I can't "repeat" my rule so that it applies in all these cases.

here are my test character strings :

abcd[zz]efgh[zz]ijkl[zz] 
# i want: abcd efgh ijkl

abcd[zz]efgh[zz]ijkl
# i want: abcd efgh ijkl

abcd[zz]efgh
# i want: abcd efgh

abcd[zz]
# i want: abcd

[zz]abcd
# i want: abcd

Thank you in advance


Solution

  • You can extract strings that do not contain ] and [ that are not immediately followed with any chars other than square brackets and then a ] char:

    (?=([^\]\[]+))\1(?![\]\[]*])
    

    The trick is also to use an atomic first pattern so as to stop backtracking to return a part of a match. In JavaScript regex, the atomic pattern can be defined with a positive lookahead capturing a pattern, and then using a backreference to the matched text right after.

    Details: