I have a problem with Pandas Series dtype automatic conversion. Let's take a simple Dataframe
data = {
"t": [1, 2, 3],
"x": [1.7, 8.5, 4.3]
}
df = pd.DataFrame(data)
print(df)
Here I have two columns, "t" is of type "int" and "x" is of type "float". Now if I want to extract a single row from this df:
row_as_ser = df.iloc[0]
t_row_as_ser = row_as_ser["t"]
print(row_as_ser)
The value from the "t" column gets converted to float. How can I disable this behaviour or re-convert the values to int if possible ?
I know Series is a single type so I would like it to be "object".
I tried the convert_dtypes function but it keeps a float. I know i can re-convert each element one by one but I have a lot of columns that can possibly change name.
For the moment, I extract my row using the query function:
row_as_df = df.query("t == 3") // Let s name it row_as_df because query returns a DataFrame
t_row_as_df = row_as_df["t"]
wanted_row = row_as_df.iloc[0] // The data I want, but it converts values like "t" to float
The data types are preserved because row_as_df is a DataFrame, but I can't use it because the variable "t_row_as_df" is a Series, I would need to use t_row_as_df.values[0], but my row is an argument in a function that would preferably accept a Series by default, and t_row_as_ser is an int (so it does not have a .values attribute).
Just to prove that what I want is possible: if I have a column "s" in the original dataframe that contains strings, then the extracted wanted_row would be a Series of dtype "object" and the value of "t" would be an int.
Here's one approach:
m = df['t'] == 3
s = (s.astype(object).squeeze()
if (s:=df.loc[m]).dtypes.nunique() > 1
else s.squeeze()
)
Output:
dtypes
(OP's df
):# for `df = pd.DataFrame(data)`
s
t 3
x 4.3
Name: 2, dtype: object
dtypes
(e.g. int
):# for `df = pd.DataFrame(data).astype(int)`
s
t 3
x 4
Name: 2, dtype: int32
Explanation
df.loc
with boolean indexing or indeed df.query
to select a df
slice.Series.nunique
for df.dtypes
> 1, we need dtype object
, else apply df.squeeze
immediately to preserve the dtype already shared.This way you will only adjust the dtype
when necessary.
Slightly more fancy would be to accept all integers as compatible:
df = pd.DataFrame({'t': pd.Series([1,2], dtype=np.int32),
'x': pd.Series([3,4], dtype=np.int64)}
)
m = df['t'] == 2
integers = lambda dtype: (np.int64
if np.issubdtype(dtype, np.integer)
else dtype)
s = (s.astype(object).squeeze()
if (s:=df.loc[m]).dtypes.map(integers).nunique() > 1
else s.squeeze()
)
Output:
t 2
x 4
Name: 1, dtype: int64
df.dtypes
to int64
where np.issubdtype
with np.integer
is True
before checking the number of unique dtypes
.