regexrfcabnf

RFC regular expression operators


I recently read an RFC document and I noticed that regex operators that have been used don't match the commonly known. For example:

date-time = [ day-of-week "," ] date time [CFWS]
year = (FWS 4*DIGIT FWS) / obs-year

The square bracket means that it will match only one out of several characters in it. But in the RFC I see that they interpret it as "optionally". The same with the asterix, that says the preceding token will occur zero times or more. In the example we have

4*DIGIT

which is not difficult to guess that means 4 occurences of DIGIT token.

How should I interpret the RFC document regex operators, is there any document describing their designation?


Solution

  • The document (I believe) you're looking at, RFC 2822, says this:

    1.2.2. Syntactic notation

    This standard uses the Augmented Backus-Naur Form (ABNF) notation specified in [RFC2234] for the formal definitions of the syntax of messages.

    So, yes, the syntax is defined in RFC 2234, and is not Regular Expressions.

    A few sections specific to the block you've quoted:

    3.5 Sequence Group

    Elements enclosed in parentheses are treated as a single element, whose contents are STRICTLY ORDERED.

    3.6 Variable Repetition

    The operator "*" preceding an element indicates repetition. The full form is:

       <a>*<b>element
    

    where <a> and <b> are optional decimal values, indicating at least <a> and at most <b> occurrences of element.

    3.8 Optional Sequence

    Square brackets enclose an optional element sequence: