pythondatetimetimezoneformatrfc5322

Parsing date with timezone from an email?


I am trying to retrieve date from an email. At first it's easy:

message = email.parser.Parser().parse(file)
date = message['Date']
print date

and I receive:

'Mon, 16 Nov 2009 13:32:02 +0100'

But I need a nice datetime object, so I use:

datetime.strptime('Mon, 16 Nov 2009 13:32:02 +0100', '%a, %d %b %Y %H:%M:%S %Z')

which raises ValueError, since %Z isn't format for +0100. But I can't find proper format for timezone in the documentation, there is only this %Z for zone. Can someone help me on that?


Solution

  • email.utils has a parsedate() function for the RFC 2822 format, which as far as I know is not deprecated.

    >>> import email.utils
    >>> import time
    >>> import datetime
    >>> email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100')
    (2009, 11, 16, 13, 32, 2, 0, 1, -1)
    >>> time.mktime((2009, 11, 16, 13, 32, 2, 0, 1, -1))
    1258378322.0
    >>> datetime.datetime.fromtimestamp(1258378322.0)
    datetime.datetime(2009, 11, 16, 13, 32, 2)
    

    Please note, however, that the parsedate method does not take into account the time zone and time.mktime always expects a local time tuple.

    >>> (time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) ==
    ... time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0100'))
    True
    

    So you'll still need to parse out the time zone and take into account the local time difference, too:

    >>> REMOTE_TIME_ZONE_OFFSET = +9 * 60 * 60
    >>> (time.mktime(email.utils.parsedate('Mon, 16 Nov 2009 13:32:02 +0900')) +
    ... time.timezone - REMOTE_TIME_ZONE_OFFSET)
    1258410122.0