pythonpandasdtype

How to create a pandas Series with a dtype which is a subclass of float?


I would like to create a pandas Series of a type derived from float. However, pandas automatically recast it as float:

import pandas as pd

class PValue(float):
    def __str__(self):
        if self < 1e-4:
            return '<1e-4'
        return super().__str__()


s = pd.Series([0.1, 0.12e-5])
s = s.map(PValue)

print(s.apply(type)) # -> returns `float`, but I want to get `PValue`

Solution

  • I think you'd need to use an extension type to get it to work how you want.

    But, a class with only one method probably shouldn't be a class. Check out Stop Writing Classes by Jack Diederich from PyCon 2012. You can do the same thing with a formatter function:

    def pvalue(x: float) -> str:
        if x < 1e-4:
            return '<1e-4'
        return str(x)
    

    Then for example:

    s = pd.Series([0.1, 0.12e-5])
    with pd.option_context('display.float_format', pvalue):
        s
    
    0     0.1
    1   <1e-4
    dtype: float64
    

    Or, for use in a dataframe, if you don't want to format all the columns as pvalues, use a style:

    pd.DataFrame({'p': s}).style.format({'p': pvalue})
    

    This is shown in Jupyter as an HTML table like this:

            p
    0     0.1
    1   <1e-4