pythondatetime

Python convert mm/dd/yyyy to yyyymmdd using date_format


I have a csv file with a partial format of something like:

field1,bmm/bdd/byyyy,emm/edd/eyyyy,field4....

I am successfully creating a json file like this:

{
    "field1": [
        {
            "begDate": byyyybmmbdd,
            "endDate": eyyyyemmedd,
            "score": field4,
.....

The python script that I was using works fine but it gives me deprecation warning:

from datetime import datetime

dateparse = lambda x: datetime.strptime(x, '%m/%d/%Y')
df = pd.read_csv("input.csv", parse_dates=['Start', 'End'], date_parser=dateparse)
df['Start'] = df['Start'].astype(str)
df['End'] = df['End'].astype(str)
df['score'] = df['score'].round(decimals=3)
res = {}
for a1, df_gp in df.groupby('field1'):
    res[a1] = df_gp.drop(columns='field1').to_dict(orient='records')
print(json.dumps(res, indent=4).lower())

 FutureWarning: The argument 'date_parser' is deprecated and will be removed in a future version.
Please use 'date_format' instead, or read your data in as 'object' dtype and then call 'to_datetime'.

Id like to be able to run the script w/o the warning so I modified the script accordingly:

dateparse = lambda x: datetime.strptime(x, '%m/%d/%Y')
df = pd.read_csv("input.csv", parse_dates=['Start', 'End'], date_format=dateparse)


I also tried this:

dateparse = lambda x: datetime.strptime(x, '%m/%d/%Y').strftime("%Y%m%d")
df = pd.read_csv("input.csv", parse_dates=['Start', 'End'], date_format=dateparse)

but the json output gives me the wrong date format:

{
    "field1": [
        {
            "begDate": bmm/bdd/byyyy,
            "endDate": emm/edd/eyyyy,
            "score": 0.0,
....

Are there any suggestions on how to get around this Warning message while receiving the desired output?


Solution

  • You can avoid the deprecation warning by not trying to replace the deprecated date_parser with a callable in date_format (which expects a string, not a function). Instead, load the dates as objects and then convert them with pd.to_datetime and dt.strftime to get the format you want. For example:

    import pandas as pd
    from datetime import datetime
    import json
    
    # Read the CSV without a custom parser (dates will be parsed based on 'parse_dates')
    df = pd.read_csv("input.csv", parse_dates=['Start', 'End'])
    
    # Now convert the date columns to the desired format (e.g. "YYYYMMDD")
    df['Start'] = pd.to_datetime(df['Start'], format='%m/%d/%Y').dt.strftime('%Y%m%d')
    df['End'] = pd.to_datetime(df['End'], format='%m/%d/%Y').dt.strftime('%Y%m%d')
    
    df['score'] = df['score'].round(3)
    
    # Group and convert to the desired JSON structure
    res = {}
    for a1, df_gp in df.groupby('field1'):
        res[a1] = df_gp.drop(columns='field1').to_dict(orient='records')
    
    print(json.dumps(res, indent=4).lower())
    
    

    This way, you get rid of the warning and still achieve your desired JSON output.