I have a csv file which looks like this
Study ID,CG_Arrival_Date/Time,Arrival_Date,Arrival_Time
2,1/1/2011 0:03,1/1/2011,0:03:00
3,1/1/2011 0:53,1/1/2011,0:53:00
I wanted to parse arrival date and arrival time, so I worte this on Google Colab.
df = pd.read_csv('/content/SM_AI_Data.csv', index_col=['Study ID'],parse_dates=[['Arrival Date','Arrival Time']])
df.head()
The columns get combined fine, but its data type is still object.
How can I change the code so that it works?
Please see the details above.
I can't reproduce any problem if I specify the columns correctly. I use the indexes here because I'm too lazy to use the names :
import pandas as pd
from io import StringIO
csv_text="""
Study ID,Arrival_Date/Time,Arrival_Date,Arrival_Time
2,1/1/2011 0:03,1/1/2011,0:03:00
3,1/1/2011 0:53,1/1/2011,0:53:00"""
df = pd.read_csv(StringIO(csv_text), index_col=['Study ID'],parse_dates=[1,2])
df.dtypes
This produces
Arrival_Date/Time datetime64[ns]
Arrival_Date datetime64[ns]
Arrival_Time object
The contents are:
Arrival_Date/Time Arrival_Date Arrival_Time
Study ID
2 2011-01-01 00:03:00 2011-01-01 0:03:00
3 2011-01-01 00:53:00 2011-01-01 0:53:00
The Date and Time columns can be combined too :
import pandas as pd
from io import StringIO
csv_text="""
Study ID,Arrival_Date/Time,Arrival_Date,Arrival_Time
2,1/1/2011 0:03,1/1/2011,0:03:00
3,1/1/2011 0:53,1/1/2011,0:53:00"""
df = pd.read_csv(StringIO(csv_text), index_col=['Study ID'],parse_dates=[[2,3],1])
df.dtypes
The contents are
Arrival_Date_Arrival_Time Arrival_Date/Time
Study ID
2 2011-01-01 00:03:00 2011-01-01 00:03:00
3 2011-01-01 00:53:00 2011-01-01 00:53:00
It's not possible to tell what the format is from the values. DD/MM/YYYY or MM/DD/YYYY? The correct format can be specified by setting the dayfirst
parameter to True
or False`