I'm trying to regex match any duplicate words (i.e. alphanumeric and can have dashes) in some yaml with a PCRE tool.
I have found a consecutive, duplicate regex matcher:
(?<=,|^)([^,]*)(,\1)+(?=,|$)
it will catch:
hello-world,hello-world,goodbye-world,goodbye-world
but not the hello-world
s in
hello-world,goodbye-world,goodbye-world,hello-world
Could someone help me try to build a regex pattern for the second case (or both cases)?
You may use this regex:
(?<=,|^)([^,]+)(?=(?>,[^,]*)*,\1(?>,|$)),
RegEx Details:
(?<=^|,)
: Assert that we have ,
or start position before current position([^,]+)
: Match 1+ of non-comma text and capture in group #1(?=(?>,[^,]*)*,\1(?>,|$))
: Lookahead to assert presence of same value we captured in group #1 ahead of us,
: Match ,