I'm having trouble getting set operators to work in the regex module (regex 2013-11-29) in python-3.x. For example, to match ASCII characters minus punctuation I have tried:
import regex as rx
data = '(foo)'
for m in rx.finditer(r'[\p{ASCII}--\p{P}]+',data):
print(m.group(0)) # expect 'foo', getting '(foo)'
The documentation gives this example:
[\p{N}--[0-9]] # Set containing all numbers except '0' .. '9'
Am I missing something here?
It sounds like you need to explicitly opt into Version 1 behavior so that the -- is interpreted as a set operator and not as characters to include in the class.
From the module web page:
Version 1 behaviour (new behaviour, different from the current
remodule):Indicated by the
VERSION1orV1flag, or(?V1)in the pattern.
.
splitwill split a string at a zero-width match.Inline flags apply to the end of the group or pattern, and they can be turned off.
Nested sets and set operations are supported.
Case-insensitive matches in Unicode use full case-folding by default.
If no version is specified, the
regexmodule will default toregex.DEFAULT_VERSION. In the short term this will beVERSION0, but in the longer term it will beVERSION1.