pythonpandas

SAS date9 format to python datetime


I've got a CSV formatted with SAS's date9. output. But dateutil.parser.parse doesn't recognise it and nor does pandas.to_datetime (also a question, does to_datetime use dateutil.parser?).

They throw:

  File "C:\redacted\dateutil\parser.py", line 559, in parse
    raise ValueError("Unknown string format")

ValueError: Unknown string format

The format looks like this:

0         12FEB1993
1         12FEB1993
...
16        16NOV1972
17        01SEP2001
18        06JUL2000
19        01FEB2014

With everything that matters in the second column. Is there any library that can solve this or do I have to roll my own? And if it is the latter, how?


Solution

  • Can you pass in the format argument to pandas.to_datetime()? Something like: pandas.to_datetime(date_column, format='%d%b%Y')

    If you can get the dates as strings, then any date-parsing function that takes strptime formatting should work with the pattern %d%b%Y:

    >>> datetime.datetime.strptime('12FEB1993', '%d%b%Y')
    datetime.datetime(1993, 2, 12, 0, 0)
    

    EDIT:

    It looks like pandas.to_datetime() results in pandas.Timestamp objects, which, due to resolution limits are limited to ~584 years, or a max year of 2262.

    Because you have date-string that extend beyond that, you could read the column in as a string, and then call apply to convert the values into date objects:

    import datetime
    my_df['date'] = my_df['date_text'].apply(lambda x: datetime.datetime.strptime(x, '%d%b%Y').date())