pythonpandasdataframepandas-loc

Why dtypes are not changing when updating columns in Pandas 2.x but would change in Pandas 1.x?


When changing the values and/or dtypes of specific columns there is a different behaviour from Pandas 1.x to 2.x.

For example, on column e in the example below:

What change from Pandas 1.x to 2.x explains this behavior?

Example code

import pandas as pd

# Creates example DataFrame
df = pd.DataFrame({
    'a': ['1', '2'],
    'b': ['1.0', '2.0'],
    'c': ['True', 'False'],
    'd': ['2024-03-07', '2024-03-06'],
    'e': ['07/03/2024', '06/03/2024'],
    'f': ['aa', 'bb'],
})

# Changes dtypes of existing columns
df.loc[:, 'a'] = df.a.astype('int')
df.loc[:, 'b'] = df.b.astype('float')
df.loc[:, 'c'] = df.c.astype('bool')

# Parses and changes dates dtypes
df.loc[:, 'd'] = pd.to_datetime(df.d)
df.loc[:, 'e'] = pd.to_datetime(df.e, format='%d/%m/%Y')

# Changes values of existing columns
df.loc[:, 'f'] = df.f + 'cc'

# Creates new column
df.loc[:, 'g'] = [1, 2]

Results in Pandas 1.5.2

In [2]: df
Out[2]: 
   a    b     c          d          e     f  g
0  1  1.0  True 2024-03-07 2024-03-07  aacc  1
1  2  2.0  True 2024-03-06 2024-03-06  bbcc  2

In [3]: df.dtypes
Out[3]: 
a             int64
b           float64
c              bool
d    datetime64[ns]
e    datetime64[ns]
f            object
g             int64
dtype: object

Results in Pandas 2.1.4

In [2]: df
Out[2]: 
   a    b     c                    d                    e     f  g
0  1  1.0  True  2024-03-07 00:00:00  2024-03-07 00:00:00  aacc  1
1  2  2.0  True  2024-03-06 00:00:00  2024-03-06 00:00:00  bbcc  2

In [3]: df.dtypes
Out[3]: 
a    object
b    object
c    object
d    object
e    object
f    object
g     int64
dtype: object

Solution

  • From What’s new in 2.0.0 (April 3, 2023):

    Changed behavior in setting values with df.loc[:, foo] = bar or df.iloc[:, foo] = bar, these now always attempt to set values inplace before falling back to casting (GH 45333).

    So in Pandas 2+, whenever you set values with .loc, it will try to set them in place. If it succeeds, it will not create a new column, and will preserve the existing column's dtype.

    Compare this with df[foo] = bar: this will create a new column with the dtype inferred from the values that are being set. The same happens when you do df['d'] = pd.to_datetime(df.d), i.e., even in Pandas 2+, it will create a new column with dtype of datetime64[ns].