pythonpython-3.xpandasdatetimestring-to-datetime

Set the format for pd.to_datetime


Hi already referred to this post but I could not get through my issue. I have a column in my CSV which is string and the sample values are below (note that the month and year positioning are sometimes reversed). What format do I need to set in my to_datetime? I tried all the below approaches

df = pd.read_csv("filename.csv") #Imagine there is a Month column

#[1] df["Month"] = pd.to_datetime(df["Month"])
#[2] df["Month"] = pd.to_datetime(df["Month"], format="%m/%d/%Y")

[Month]
Mar-97
Apr-97
May-97
Jun-97
Nov-00
Dec-00
1-Jan
1-Feb
1-Mar
1-Apr

I get the error

ValueError: day is out of range for month

for [1] and I get

ValueError: time data 'Mar-97' does not match format '%m/%d/%Y' (match)

for [2]. I tried to remove the %d too but no luck. Could you please point me what is going wrong here.


Solution

  • One way is to use try / except with pd.Series.apply:

    s = pd.Series(['Mar-97', 'May-97', 'Nov-00', '1-Jan', '1-Mar'])
    
    def converter(x):
        try:
            return pd.datetime.strptime(x, '%b-%y')
        except ValueError:
            year, month = x.split('-')  # split by delimiter
            x = year.zfill(2) + '-' + month  # %y requires 0-padding
            return pd.datetime.strptime(x, '%y-%b')
    
    res = s.apply(converter)
    
    print(res)
    
    0   1997-03-01
    1   1997-05-01
    2   2000-11-01
    3   2001-01-01
    4   2001-03-01
    dtype: datetime64[ns]
    

    Since we have defined converter as a function, we can use this directly with pd.read_csv:

    df = pd.read_csv('file.csv', parse_dates=['dt_col_name'], date_parser=converter)
    

    Python's strftime directives is a useful reference for constructing datetime format strings.