pythonpandasdataframedatetimeparse-error

Pandas to_datetime parsing error: Unknown string format


I have a column in my pandas dataframe:

Start Date
1/7/13
1/7/13
1/7/13
16/7/13
16/7/13

When I convert it into a date format, I am getting the following error:

data['Start Date']= pd.to_datetime(data['Start Date'],dayfirst=True)
...
...
/Library/Python/2.7/site-packages/pandas/tseries/tools.pyc in _convert_listlike(arg, box, format, name)
    381                 return DatetimeIndex._simple_new(values, name=name, tz=tz)
    382             except (ValueError, TypeError):
--> 383                 raise e
    384 
    385     if arg is None:

ValueError: Unknown string format

What am I missing here?


Solution

  • I think the problem is in data - a problematic string exists. So you can try check length of the string in column Start Date:

    import pandas as pd
    import io
    
    temp=u"""Start Date
    1/7/13
    1/7/1
    1/7/13 12 17
    16/7/13
    16/7/13"""
    
    data = pd.read_csv(io.StringIO(temp), sep=";", parse_dates=False)
    
    #data['Start Date']= pd.to_datetime(data['Start Date'],dayfirst=True)
    print data
    
         Start Date
    0        1/7/13
    1         1/7/1
    2  1/7/13 12 17
    3       16/7/13
    4       16/7/13
    
    #check, if length is more as 7
    print data[data['Start Date'].str.len() > 7]
    
         Start Date
    2  1/7/13 12 17
    

    Or you can try to find these problematic row different way e.g. read only part of the datetime and check parsing datetime:

    #read first 3 rows
    data= data.iloc[:3]
    
    data['Start Date']= pd.to_datetime(data['Start Date'],dayfirst=True)
    

    But this is only tips.

    EDIT:

    Thanks joris for suggestion add parameter errors ='coerce' to to_datetime:

    temp=u"""Start Date
    1/7/13
    1/7/1
    1/7/13 12 17
    16/7/13
    16/7/13 12 04"""
    
    data = pd.read_csv(io.StringIO(temp), sep=";")
    #add parameter errors coerce
    data['Start Date']= pd.to_datetime(data['Start Date'], dayfirst=True, errors='coerce')
    print data
    
      Start Date
    0 2013-07-01
    1 2001-07-01
    2        NaT
    3 2013-07-16
    4        NaT
    
    #index of data with null - NaT to variable idx
    idx = data[data['Start Date'].isnull()].index
    print idx
    
    Int64Index([2, 4], dtype='int64')
    
    #read csv again
    data = pd.read_csv(io.StringIO(temp), sep=";")
    
    #find problematic rows, where datetime is not parsed
    print data.iloc[idx]
    
          Start Date
    2   1/7/13 12 17
    4  16/7/13 12 04