pythonexcelpandaspython-polars

Polars.write_excel: How to remove thousand separator for i64 & f64 and remove trailing zero for f64 efficiently?


SOLUTION as of 16JUL25:

  1. See rotabor's float_precision answer for trailing zero problem.

  2. To solve thousands separator problem gracefully without unnecessary steps, do NOT bother using polars.write_excel for now. Use either pandas.to_excel or polars.write_csv instead. polars.write_excel's default engine: xlsxwriter is not fully mature yet.


I used to code in python-pandas for my day-to-day work and now I'm moving to python-polars for a good change since it improves my productivity.

Libraries I use: polars as pl polars.selectors as sc

In my current project, I imported my data via pl.read_excel('abc.xlsx', engine='openpyxl') instead of the default engine since it gave me error on column that it can't detect dtype.

My data consists of all kind of dtypes, and in particular, Int64 and Float64. Prior to exporting my data to .xlsx Excel workbook, I have had to use df = df.with_columns(sc.by_dtype(pl.Float64).round(2)) to round all Float64 columns to 2 decimal places. After that, I exported my data using pl.write_excel('abc_v2.xlsx').

(1) Here comes the first problem: The Float64 columns shown in python are of 2 decimal places, but an extra trailing zero is added to the back of values in the output Excel file.

Note: The original data in Excel are of 2 decimal places without trailing zero in all Float64 columns.

(2) My second problem: Both Int64 and Float64 columns shown in python are without thousand separator but the thousand separator is introduced in the output Excel file.

Note: The original data in Excel does not come with thousand separator in all Int64 and Float64 columns.

Is there any fix to this problem with polars.write_excel()? I have no such issue with pandas.to_excel().

---EDIT--- Added my codes below for your convenience of testing. The 'sum_per_group', 'Amount' and 'Total Amount' consist of floating-point numbers with 2 decimal places (Float64 format) whereas 'ID' consists of 6 to 7 digits integers (Int64 format).

import polars as pl
import polars.selectors as sc

df = pl.read_excel('../Documents/abc.xlsx', engine='openpyxl')

df = df.with_columns(sum_per_group = pl.col('Amount').sum().over('ID').sort_by('ID'))

df = df.with_columns(sc.by_dtype(pl.Float64).round(2))

df = df.with_columns([
    pl.when(pl.col('sum_per_group') != pl.col('Total Amount'))
    .then(pl.lit('No'))
    .otherwise(pl.lit('Yes'))
    .alias('Flag')
])

df.write_excel('abc_v2.xlsx')

---EDIT2--- Added a picture of my OS regional settings

enter image description here


Solution

  • Some suggestions for you:

    1. Don't mix the cell content and its representation. The representation is defined by the cell format. To control thousand separator appearance and number of decimal digits, set the appropriate cell format.
    2. By default, Excel doesn't apply the thousand separator when the number is entered the cell. The displayed number of decimal digits depends on the column width.
    3. If the thousand separator appears on df = df.with_columns(sc.by_dtype(pl.Float64).round(2)), it can be sign that Excel perhaps receives the text not a number and is not able to convert this text to the number. Check it for sure and correct if necessary.
    4. Focus on dtype_formats and float_precision arguments of polars.DataFrame.write_excel.