I'm writing some code to parse RSS feeds but I have trouble with the Abstruse Goose RSS feed.
If you look in that feed, dates are encoded as Mon, 06 Aug 2018 00:00:00 UTC
. To me, it looks like RFC 2822.
I tried to parse it using chrono's DateTime::parse_from_rfc2822
, but I get ParseError(NotEnough)
.
let pub_date = entry.pub_date().unwrap().to_owned();
return rfc822_sanitizer::parse_from_rfc2822_with_fallback(&pub_date)
.unwrap_or_else(|e| {
panic!(
"pub_date for item {:?} (value is {:?}) can't be parsed due to error {:?}",
&entry, pub_date, e
)
})
.naive_utc();
Is there something I'm doing wrong? Do I have to hack it some way?
I use rfc822_sanitizer which does a good job at fixing bad writing errors (most of the time). I don't think it impacts the parsing ... but who knows?
The RFC2822
date/time format is very well codified in the RFC as the following format:
date-time = [ day-of-week "," ] date FWS time [CFWS]
day-of-week = ([FWS] day-name) / obs-day-of-week
day-name = "Mon" / "Tue" / "Wed" / "Thu" /
"Fri" / "Sat" / "Sun"
date = day month year
year = 4*DIGIT / obs-year
month = (FWS month-name FWS) / obs-month
month-name = "Jan" / "Feb" / "Mar" / "Apr" /
"May" / "Jun" / "Jul" / "Aug" /
"Sep" / "Oct" / "Nov" / "Dec"
day = ([FWS] 1*2DIGIT) / obs-day
time = time-of-day FWS zone
time-of-day = hour ":" minute [ ":" second ]
hour = 2DIGIT / obs-hour
minute = 2DIGIT / obs-minute
second = 2DIGIT / obs-second
zone = (( "+" / "-" ) 4DIGIT) / obs-zone
Where obs-zone
is defined as follows:
obs-zone = "UT" / "GMT" / ; Universal Time
; North American UT
; offsets
"EST" / "EDT" / ; Eastern: - 5/ - 4
"CST" / "CDT" / ; Central: - 6/ - 5
"MST" / "MDT" / ; Mountain: - 7/ - 6
"PST" / "PDT" / ; Pacific: - 8/ - 7
%d65-73 / ; Military zones - "A"
%d75-90 / ; through "I" and "K"
%d97-105 / ; through "Z", both
%d107-122 ; upper and lower case
Something a lot of people get wrong when rolling their own timestamp generation library is this particular point - how to properly label an RFC2822
TZ offset. The reason UT
is as it is is because UTC
and UT
are not exactly the same (one has additional seconds, the other has... four variants! And the RFC does not define which one is used; they're all subtly different).