I'm slowly fine-tuning my sieve filter. I noticed I was getting a lot of spam in Russian, so I thought I could filter on the presence of Cyrillic in the subject. I thought maybe three consecutive characters would be a good test, and it seems to work pretty well. Here's the line:
elsif header :regex "Subject" [ "[а-яА-Я]{3,}" ]
It's not ideal, because there are plenty of Cyrillic characters outside the А-Я range. Also, I'd like to do the same with CJK characters, and I'm not sure even how to begin with those.
Is it possible in sieve to specify a script as a character class? I've done it before in other regex implementations, but it seems to me that it's handled differently, if at all, by different regex flavours.
Thanks, Ben
You can use
[\p{Cyrillic}\p{Han}]{3}
Details:
[
- start of a character class
\p{Cyrillic}
- any Cyrillic char\p{Han}
- any Chinese char]{3}
- end of the character class, three repetitions.