pythonpandas

Formatting integers in pandas dataframe


I've read the documentation and simply cannot understand why I can't seem to achieve my objective.

All I want to do is output integers with a thousands separator where appropriate.

I'm loading a spreadsheet from my local machine that is in the public domain here

Here's my MRE:

import pandas as pd

WORKBOOK = "/Volumes/Spare/Downloads/prize-june-2025.xlsx"

def my_formatter(v):
    return f"{v:,d}" if isinstance(v, int) else v

df = pd.read_excel(WORKBOOK, header=2, usecols="B,C,E:H")
print(df.dtypes)
df.style.format(my_formatter)
print(df.head())

Output:

Prize Value                    int64
Winning Bond NO.              object
Total V of Holding             int64
Area                          object
Val of Bond                    int64
Dt of Pur             datetime64[ns]
dtype: object
   Prize Value Winning Bond NO.  Total V of Holding                Area  Val of Bond  Dt of Pur
0      1000000      103FE583469               50000           Stockport         5000 2005-11-29
1      1000000      352AC359547               50000  Edinburgh, City Of         5000 2019-02-11
2       100000      581WF624503               50000          Birmingham        20000 2024-06-03
3       100000      265SM364866               50000       Hertfordshire        32500 2016-01-31
4       100000      570HE759643               11000       Hertfordshire        11000 2024-02-22

I have determined that my_formatter() is never called and I have no idea why.


Solution

  • Your approach works fine, however style does not modify the DataFrame in place. Instead it returns a special object that can be displayed (for instance in a notebook) or exported to a file.

    You could see the HTML version in jupyter with:

    df.style.format(my_formatter)
    

    (this should be the last statement of the current cell!)

    Or a text version with:

    print(df.style.format(my_formatter).to_string())
    

    jupyter output

    text output

    Note that your approach is however quite slow. If you have homogeneous dtypes, you could take advantage of the builtin thousands parameter:

    df.style.format(thousands=',')
    

    Or if you want to use a custom format per column, pass a dictionary:

    df.style.format({c: '{:,d}' for c in df.select_dtypes('number')})
    

    And, finally, if you want to change the data to strings and return a DataFrame, you would need to use map:

    out = df.map(my_formatter)