pythonpandas

Assign pandas dataframe column dtypes


I want to set the dtypes of multiple columns in pd.Dataframe (I have a file that I've had to manually parse into a list of lists, as the file was not amenable for pd.read_csv)

import pandas as pd
print pd.DataFrame([['a','1'],['b','2']],
                   dtype={'x':'object','y':'int'},
                   columns=['x','y'])

I get

ValueError: entry not a 2- or 3- tuple

The only way I can set them is by looping through each column variable and recasting with astype.

dtypes = {'x':'object','y':'int'}
mydata = pd.DataFrame([['a','1'],['b','2']],
                      columns=['x','y'])
for c in mydata.columns:
    mydata[c] = mydata[c].astype(dtypes[c])
print mydata['y'].dtype   #=> int64

Is there a better way?


Solution

  • Since 0.17, you have to use the explicit conversions:

    pd.to_datetime, pd.to_timedelta and pd.to_numeric
    

    (As mentioned below, no more "magic", convert_objects has been deprecated in 0.17)

    df = pd.DataFrame({'x': {0: 'a', 1: 'b'}, 'y': {0: '1', 1: '2'}, 'z': {0: '2018-05-01', 1: '2018-05-02'}})
    
    df.dtypes
    
    x    object
    y    object
    z    object
    dtype: object
    
    df
    
       x  y           z
    0  a  1  2018-05-01
    1  b  2  2018-05-02
    

    You can apply these to each column you want to convert:

    df["y"] = pd.to_numeric(df["y"])
    df["z"] = pd.to_datetime(df["z"])    
    df
    
       x  y          z
    0  a  1 2018-05-01
    1  b  2 2018-05-02
    
    df.dtypes
    
    x            object
    y             int64
    z    datetime64[ns]
    dtype: object
    

    and confirm the dtype is updated.


    OLD/DEPRECATED ANSWER for pandas 0.12 - 0.16: You can use convert_objects to infer better dtypes:

    In [21]: df
    Out[21]: 
       x  y
    0  a  1
    1  b  2
    
    In [22]: df.dtypes
    Out[22]: 
    x    object
    y    object
    dtype: object
    
    In [23]: df.convert_objects(convert_numeric=True)
    Out[23]: 
       x  y
    0  a  1
    1  b  2
    
    In [24]: df.convert_objects(convert_numeric=True).dtypes
    Out[24]: 
    x    object
    y     int64
    dtype: object
    

    Magic! (Sad to see it deprecated.)