regexunicodecjkcyrillicsieve-language

Does sieve's version of regex allow me to look for Cyrillic or CJK characters?


I'm slowly fine-tuning my sieve filter. I noticed I was getting a lot of spam in Russian, so I thought I could filter on the presence of Cyrillic in the subject. I thought maybe three consecutive characters would be a good test, and it seems to work pretty well. Here's the line:

elsif header :regex "Subject" [ "[а-яА-Я]{3,}" ]

It's not ideal, because there are plenty of Cyrillic characters outside the А-Я range. Also, I'd like to do the same with CJK characters, and I'm not sure even how to begin with those.

Is it possible in sieve to specify a script as a character class? I've done it before in other regex implementations, but it seems to me that it's handled differently, if at all, by different regex flavours.

Thanks, Ben


Solution

  • You can use

    [\p{Cyrillic}\p{Han}]{3}
    

    Details: