daterustrfc2822rust-chrono

Why is my RFC 2822 date not parsed by chrono?


I'm writing some code to parse RSS feeds but I have trouble with the Abstruse Goose RSS feed. If you look in that feed, dates are encoded as Mon, 06 Aug 2018 00:00:00 UTC. To me, it looks like RFC 2822.

I tried to parse it using chrono's DateTime::parse_from_rfc2822, but I get ParseError(NotEnough).

let pub_date = entry.pub_date().unwrap().to_owned();
return rfc822_sanitizer::parse_from_rfc2822_with_fallback(&pub_date)
    .unwrap_or_else(|e| {
        panic!(
            "pub_date for item {:?} (value is {:?}) can't be parsed due to error {:?}",
            &entry, pub_date, e
        )
    })
    .naive_utc();

Is there something I'm doing wrong? Do I have to hack it some way?

I use rfc822_sanitizer which does a good job at fixing bad writing errors (most of the time). I don't think it impacts the parsing ... but who knows?


Solution

  • The RFC2822 date/time format is very well codified in the RFC as the following format:

    date-time       =       [ day-of-week "," ] date FWS time [CFWS]
    day-of-week     =       ([FWS] day-name) / obs-day-of-week
    day-name        =       "Mon" / "Tue" / "Wed" / "Thu" /
                            "Fri" / "Sat" / "Sun"
    date            =       day month year
    year            =       4*DIGIT / obs-year
    month           =       (FWS month-name FWS) / obs-month
    month-name      =       "Jan" / "Feb" / "Mar" / "Apr" /
                            "May" / "Jun" / "Jul" / "Aug" /
                            "Sep" / "Oct" / "Nov" / "Dec"
    day             =       ([FWS] 1*2DIGIT) / obs-day
    time            =       time-of-day FWS zone
    time-of-day     =       hour ":" minute [ ":" second ]
    hour            =       2DIGIT / obs-hour
    minute          =       2DIGIT / obs-minute
    second          =       2DIGIT / obs-second
    zone            =       (( "+" / "-" ) 4DIGIT) / obs-zone
    

    Where obs-zone is defined as follows:

    obs-zone        =       "UT" / "GMT" /          ; Universal Time
                                                    ; North American UT
                                                    ; offsets
                            "EST" / "EDT" /         ; Eastern:  - 5/ - 4
                            "CST" / "CDT" /         ; Central:  - 6/ - 5
                            "MST" / "MDT" /         ; Mountain: - 7/ - 6
                            "PST" / "PDT" /         ; Pacific:  - 8/ - 7
                            %d65-73 /               ; Military zones - "A"
                            %d75-90 /               ; through "I" and "K"
                            %d97-105 /              ; through "Z", both
                            %d107-122               ; upper and lower case
    

    Something a lot of people get wrong when rolling their own timestamp generation library is this particular point - how to properly label an RFC2822 TZ offset. The reason UT is as it is is because UTC and UT are not exactly the same (one has additional seconds, the other has... four variants! And the RFC does not define which one is used; they're all subtly different).