When changing the values and/or dtypes of specific columns there is a different behaviour from Pandas 1.x to 2.x.
For example, on column e
in the example below:
pd.to_datetime
to update the column will
parse the date and change its dtypepd.to_datetime
to update the column will parse the date but
will not change its dtypeWhat change from Pandas 1.x to 2.x explains this behavior?
Example code
import pandas as pd
# Creates example DataFrame
df = pd.DataFrame({
'a': ['1', '2'],
'b': ['1.0', '2.0'],
'c': ['True', 'False'],
'd': ['2024-03-07', '2024-03-06'],
'e': ['07/03/2024', '06/03/2024'],
'f': ['aa', 'bb'],
})
# Changes dtypes of existing columns
df.loc[:, 'a'] = df.a.astype('int')
df.loc[:, 'b'] = df.b.astype('float')
df.loc[:, 'c'] = df.c.astype('bool')
# Parses and changes dates dtypes
df.loc[:, 'd'] = pd.to_datetime(df.d)
df.loc[:, 'e'] = pd.to_datetime(df.e, format='%d/%m/%Y')
# Changes values of existing columns
df.loc[:, 'f'] = df.f + 'cc'
# Creates new column
df.loc[:, 'g'] = [1, 2]
Results in Pandas 1.5.2
In [2]: df
Out[2]:
a b c d e f g
0 1 1.0 True 2024-03-07 2024-03-07 aacc 1
1 2 2.0 True 2024-03-06 2024-03-06 bbcc 2
In [3]: df.dtypes
Out[3]:
a int64
b float64
c bool
d datetime64[ns]
e datetime64[ns]
f object
g int64
dtype: object
Results in Pandas 2.1.4
In [2]: df
Out[2]:
a b c d e f g
0 1 1.0 True 2024-03-07 00:00:00 2024-03-07 00:00:00 aacc 1
1 2 2.0 True 2024-03-06 00:00:00 2024-03-06 00:00:00 bbcc 2
In [3]: df.dtypes
Out[3]:
a object
b object
c object
d object
e object
f object
g int64
dtype: object
From What’s new in 2.0.0 (April 3, 2023):
Changed behavior in setting values with
df.loc[:, foo] = bar
ordf.iloc[:, foo] = bar
, these now always attempt to set values inplace before falling back to casting (GH 45333).
So in Pandas 2+, whenever you set values with .loc
, it will try to set them in place. If it succeeds, it will not create a new column, and will preserve the existing column's dtype
.
Compare this with df[foo] = bar
: this will create a new column with the dtype
inferred from the values that are being set. The same happens when you do df['d'] = pd.to_datetime(df.d)
, i.e., even in Pandas 2+, it will create a new column with dtype
of datetime64[ns]
.