pythondatecsvpandasformats

pandas reading dates from csv in yy-mm-dd format


I have a csv files with dates in the format displayed as dd-mmm-yy and i want to read in the format yyyy-mm-dd. parse dates option works but it not converting dates correct before 2000 Example: actual date is 01-Aug-1968. It is displayed as 01-Aug-68. Pandas date parase and correction=true reads the date as 01-Aug-2068.

Is there any option to read the date in pandas in the correct format for the dates before 2000.


Solution

  • from dateutil.relativedelta import relativedelta
    import datetime
    

    let's assume you have a csv like this:

    mydates
    18-Aug-68
    13-Jul-45
    12-Sep-00
    20-Jun-10
    15-Jul-60
    

    Define your date format

    d = lambda x: pd.datetime.strptime(x, '%d-%b-%y')
    

    Put a constraint on them

    dateparse = lambda x: d(x) if d(x) < datetime.datetime.now() else d(x) - relativedelta(years=100) 
    

    read your csv:

    df = pd.read_csv("myfile.csv", parse_dates=['mydates'], date_parser=dateparse)
    

    here is your result:

    print df
    
         mydates
    0 1968-08-18
    1 1945-07-13
    2 2000-09-12
    3 2010-06-20
    4 1960-07-15
    

    VoilĂ