I've got a CSV formatted with SAS's date9.
output. But dateutil.parser.parse
doesn't recognise it and nor does pandas.to_datetime
(also a question, does to_datetime
use dateutil.parser
?).
They throw:
File "C:\redacted\dateutil\parser.py", line 559, in parse
raise ValueError("Unknown string format")
ValueError: Unknown string format
The format looks like this:
0 12FEB1993
1 12FEB1993
...
16 16NOV1972
17 01SEP2001
18 06JUL2000
19 01FEB2014
With everything that matters in the second column. Is there any library that can solve this or do I have to roll my own? And if it is the latter, how?
Can you pass in the format
argument to pandas.to_datetime()
? Something like: pandas.to_datetime(date_column, format='%d%b%Y')
If you can get the dates as strings, then any date-parsing function that takes strptime
formatting should work with the pattern %d%b%Y
:
>>> datetime.datetime.strptime('12FEB1993', '%d%b%Y')
datetime.datetime(1993, 2, 12, 0, 0)
EDIT:
It looks like pandas.to_datetime()
results in pandas.Timestamp
objects, which, due to resolution limits are limited to ~584 years, or a max year of 2262.
Because you have date-string that extend beyond that, you could read the column in as a string, and then call apply
to convert the values into date
objects:
import datetime
my_df['date'] = my_df['date_text'].apply(lambda x: datetime.datetime.strptime(x, '%d%b%Y').date())