pythonpython-3.xpandasdataframedtype

How do I make pandas.read_csv() parse one column as datetime while treating all others as strings?


I am using pandas.read_csv to load a CSV file. I want to have my file read mostly as-is to avoid any automatic data type conversions, except for a specific column (send_date) I want parsed as a datetime.

The reason I want most columns read as strings or objects is to preserve data like zip codes with leading zeros (04321) and Boolean-like values (true, false, unknown) that are stored as strings.

Problem

Using read_csv without specifying dtype causes unwanted type conversions.

df = pandas.read_csv("test.csv", parse_dates=['send_date'])
# name: Madeline (type: object)                         - correct
# zip_code: 4321 (type: int64)                          - wrong (missing leading 0)
# send_date: 2025-04-13 00:00:00 (type: datetime64[ns]) - correct
# is_customer: True (type: bool)                        - wrong (not a string)

Using dtype=object correctly preserves zip_code and is_customer as string-like values, but it prevents send_date from being set to type datetime64[ns].

df = pandas.read_csv("test.csv", dtype=object, parse_dates=['send_date'])
# name: Madeline (type: object)                 - correct
# zip_code: 04321 (type: object)                - correct
# send_date: 2025-04-13 00:00:00 (type: object) - wrong (not datetime)
# is_customer: true (type: object)              - correct

Manually setting the dtype for send_date to datetime64 raises an error.

df = pandas.read_csv("test.csv", dtype={"send_date":"datetime64"}, parse_dates=['send_date'])
# TypeError: the dtype datetime64 is not supported for parsing, pass this column using parse_dates instead

Setting dtype=str causes send_date to be interpreted as an integer timestamp.

df = pandas.read_csv("test.csv", dtype=str, parse_dates=['send_date'])
# name: Madeline (type: object)                 - correct
# zip_code: 04321 (type: object)                - correct
# send_date: 1744502400000000000 (type: object) - wrong (not a date)
# is_customer: true (type: object)              - correct

Sample Data (test.csv)

name zip_code send_date is_customer
Madeline 04321 2025-04-13 true
Theo 32255 2025-04-08 true
Granny 84564 2025-04-15 false

Desired output

Attempted Code

import pandas

def print_first_row_value_and_dtype(df: pandas.DataFrame):
    row = df.iloc[0]
    for col in df.columns:
        print(f"{col}: {row[col]} (type: {df[col].dtype})")

filename = 'test.csv'

df = pandas.read_csv(filename, parse_dates=['send_date'])  
print_first_row_value_and_dtype(df)

df = pandas.read_csv(filename, dtype=object, parse_dates=['send_date'])
print_first_row_value_and_dtype(df)

df = pandas.read_csv(filename, dtype=str, parse_dates=['send_date'])
print_first_row_value_and_dtype(df)

dtypes = {"name":"object", "zip_code":"object", "send_date":"datetime64", "is_customer":"object"}
df = pandas.read_csv(filename, dtype=dtypes, parse_dates=['send_date']) # raises TypeError

Question

How can I make pandas.read_csv() parse one column (send_date) as a datetime while treating all other columns as strings or objects to avoid unwanted data type conversions?


Solution

  • Call read_csv with dtype="string" and parse_dates=['send_date'].

    Code

    import pandas
    
    df = pandas.read_csv("test.csv", dtype="string", parse_dates=['send_date'])
    
    print(df.dtypes)
    # name           string[python]
    # zip_code       string[python]
    # send_date      datetime64[ns]
    # is_customer    string[python]
    # dtype: object
    
    print(df)
    #        name zip_code  send_date is_customer
    # 0  Madeline    04321 2025-04-13        true
    # 1      Theo    32255 2025-04-08        true
    # 2    Granny    84564 2025-04-15       false
    

    Input file (test.csv)

    name zip_code send_date is_customer
    Madeline 04321 2025-04-13 true
    Theo 32255 2025-04-08 true
    Granny 84564 2025-04-15 false