I'm using JRuby 1.7.18 and have even tried this in JRuby 9000 (latest version) where I get the same error. I'm using the soap-4r
and nokogiri
libraries to parse a wsdl xml file.
When the below part of the wsdl is parsed
<xs:pattern value="[\p{IsBasicLatin}]*"/>
I get the following error
RegexpError: (RegexpError) invalid character property name <IsBasicLatin>: /\A[\p{IsBasicLatin}]*\z/n
nokogiri/XmlSaxParserContext.java:252:in `parse_with'
nokogiri/XmlSaxParserContext.java:252:in `parse_with'
nokogiri/XmlSaxParserContext.java:252:in `parse_with'
In Ruby 1.9, which is one of the Ruby versions that JRuby 1.7.18 is compatible with, I read that character blocks like \p{IsBasicLatin}
are not supported. But scripts like \p{Latin}
are supported. I've tried changing IsBasicLatin
to Latin
and even tried a few other ones like InBasicLatin
and InBasic_Latin
but they all return the same error.
This is both in JRuby 1.7.18 and JRuby 9000 which is the latest version.
What is going wrong here and how can I fix it?
As mentioned in the comments the name of the character property is actually In_Basic_Latin
and not IsBasicLatin
. Modern versions of Ruby (MRI or CRuby to be specific) use the regular expression library Onigmo. The official Ruby docs don't list all Unicode properties but luckily Onigmo does.
Apparently JRuby doesn't seem to implement (at least) the Unicode block ones. However information (name and range) about blocks are publicly accessible. \p{In_Basic_Latin}
is therefore equivalent to [\u0000-\u007F]
. So is [[:ascii:]]
.