regexxmldatexsdescaping

How to properly escape Regular Expression pattern in XSD schema?


I need to fulfill a requirement to only accept values in the form of MM/DD/YYYY.

From what I've read on: https://www.w3.org/TR/xmlschema11-2/#nt-dateRep Using

<xs:simpleType name="DATE">
        <xs:restriction base="xs:date"/>
    </xs:simpleType>

Is not going to work as its regex apparently is not supporting this format.

I have found and adjusted this format:

^(?:(?:(?:0?[13578]|1[02])(\/)31)\1|(?:(?:0?[1,3-9]|1[0-2])(\/)(?:29|30)\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:0?2(\/)29\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:(?:0?[1-9])|(?:1[0-2]))(\/)(?:0?[1-9]|1\d|2[0-8])\4(?:(?:1[6-9]|[2-9]\d)?\d{2})$

To this form:

\^\(\?:\(\?:\(\?:0\?\[13578\]\|1\[02\]\)\(\\/\)31\)\1\|\(\?:\(\?:0\?\[1,3-9\]\|1\[0-2\]\)\(\\/\)\(\?:29\|30\)\2\)\)\(\?:\(\?:1\[6-9\]\|\[2-9\]\d\)\?\d{2}\)$\|\^\(\?:0\?2\(\\/\)29\3\(\?:\(\?:\(\?:1\[6-9\]\|\[2-9\]\d\)\?\(\?:0\[48\]\|\[2468\]\[048\]\|\[13579\]\[26\]\)\|\(\?:\(\?:16\|\[2468\]\[048\]\|\[3579\]\[26\]\)00\)\)\)\)$\|\^\(\?:\(\?:0\?\[1-9\]\)\|\(\?:1\[0-2\]\)\)\(\\/\)\(\?:0\?\[1-9\]\|1\d\|2\[0-8\]\)\4\(\?:\(\?:1\[6-9\]\|\[2-9\]\d\)\?\d{2}\)$

Now I no longer get invalid escaping errors in XML editors (using XML Spy), but I get this one:

invalid-escape: The given character escape is not recognized.

I have done the escape according to the XML schema specifications here: https://www.w3.org/TR/xmlschema-2/#regexs Section F.1.1 there is an escape table.

Can anyone please help to nail this down right?

Thanks!


Solution

  • If you check the XSD regex syntax resources, you will notice that there is no support for non-capturing groups ((?:...)), nor backreferences (the \n like entities to refer to the text captured with capturing groups, (...)).

    Since the only delimiter is /, you can get rid of the backreference completely.

    Use

    ((((0?[13578]|1[02])/31)/|((0?[13-9]|1[0-2])/(29|30)/))((1[6-9]|[2-9]\d)?\d{2}‌​)|(0?2/29/(((1[6-9]|[2-9]\d)?(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[35‌​79][26])00))))|(0?[1-9]|1[0-2])/(0?[1-9]|1\d|2[0-8])/(1[6-9]|[2-9]\d)?\d{2})
    

    See this regex demo

    Note that acc. to regular-expressions.info:

    Particularly noteworthy is the complete absence of anchors like the caret and dollar, word boundaries, and lookaround. XML schema always implicitly anchors the entire regular expression. The regex must match the whole element for the element to be considered valid.

    Also, you may read more about it in the 1.1. Note about anchors section. So, you should not use ^ (start of string) and $ (end of string) in XSD regex.

    The / symbol is escaped in regex flavors where it is a regex delimiter, and in XSD regex, there are no regex delimiters (as the only action is matching, and there are no modifiers: XML schemas do not provide a way to specify matching modes). So, do not escape / in XSD regex.

    TESTING AT ONLINE TESTERS NOTE

    If you test at regex101.com or similar sites, note that in most cases you need to escape the / if it is selected as a regex delimiter. You can safely remove the \ before / after you finished testing.