pythonpandas

Pandas Series dtype conversion


I have a problem with Pandas Series dtype automatic conversion. Let's take a simple Dataframe

data = {
   "t": [1, 2, 3],
   "x": [1.7, 8.5, 4.3]
}
df = pd.DataFrame(data)
print(df)

Here I have two columns, "t" is of type "int" and "x" is of type "float". Now if I want to extract a single row from this df:

row_as_ser = df.iloc[0]
t_row_as_ser = row_as_ser["t"]
print(row_as_ser)

The value from the "t" column gets converted to float. How can I disable this behaviour or re-convert the values to int if possible ?

I know Series is a single type so I would like it to be "object".

I tried the convert_dtypes function but it keeps a float. I know i can re-convert each element one by one but I have a lot of columns that can possibly change name.

For the moment, I extract my row using the query function:

row_as_df = df.query("t == 3") // Let s name it row_as_df because query returns a DataFrame
t_row_as_df = row_as_df["t"]
wanted_row = row_as_df.iloc[0] // The data I want, but it converts values like "t" to float

The data types are preserved because row_as_df is a DataFrame, but I can't use it because the variable "t_row_as_df" is a Series, I would need to use t_row_as_df.values[0], but my row is an argument in a function that would preferably accept a Series by default, and t_row_as_ser is an int (so it does not have a .values attribute).

Just to prove that what I want is possible: if I have a column "s" in the original dataframe that contains strings, then the extracted wanted_row would be a Series of dtype "object" and the value of "t" would be an int.


Solution

  • Here's one approach:

    m = df['t'] == 3
    
    s = (s.astype(object).squeeze() 
         if (s:=df.loc[m]).dtypes.nunique() > 1 
         else s.squeeze()
         )
    

    Output:

    # for `df = pd.DataFrame(data)`
    s
    
    t      3
    x    4.3
    Name: 2, dtype: object
    
    # for `df = pd.DataFrame(data).astype(int)`
    s
    
    t    3
    x    4
    Name: 2, dtype: int32
    

    Explanation

    This way you will only adjust the dtype when necessary.


    Slightly more fancy would be to accept all integers as compatible:

    df = pd.DataFrame({'t': pd.Series([1,2], dtype=np.int32), 
                       'x': pd.Series([3,4], dtype=np.int64)}
                      )
    
    m = df['t'] == 2
    integers = lambda dtype: (np.int64 
                              if np.issubdtype(dtype, np.integer) 
                              else dtype)
    
    s = (s.astype(object).squeeze() 
         if (s:=df.loc[m]).dtypes.map(integers).nunique() > 1 
         else s.squeeze()
         )
    

    Output:

    t    2
    x    4
    Name: 1, dtype: int64