SOLUTION as of 16JUL25:
See rotabor's float_precision answer for trailing zero problem.
To solve thousands separator problem gracefully without unnecessary steps, do NOT bother using polars.write_excel
for now. Use either pandas.to_excel
or polars.write_csv
instead. polars.write_excel
's default engine: xlsxwriter
is not fully mature yet.
I used to code in python-pandas for my day-to-day work and now I'm moving to python-polars for a good change since it improves my productivity.
Libraries I use:
polars as pl
polars.selectors as sc
In my current project, I imported my data via pl.read_excel('abc.xlsx', engine='openpyxl')
instead of the default engine since it gave me error on column that it can't detect dtype.
My data consists of all kind of dtypes, and in particular, Int64
and Float64
. Prior to exporting my data to .xlsx
Excel workbook, I have had to use df = df.with_columns(sc.by_dtype(pl.Float64).round(2))
to round all Float64
columns to 2 decimal places. After that, I exported my data using pl.write_excel('abc_v2.xlsx')
.
(1) Here comes the first problem: The Float64
columns shown in python are of 2 decimal places, but an extra trailing zero is added to the back of values in the output Excel file.
Note: The original data in Excel are of 2 decimal places without trailing zero in all Float64
columns.
(2) My second problem: Both Int64
and Float64
columns shown in python are without thousand separator but the thousand separator is introduced in the output Excel file.
Note: The original data in Excel does not come with thousand separator in all Int64
and Float64
columns.
Is there any fix to this problem with polars.write_excel()
? I have no such issue with pandas.to_excel()
.
---EDIT---
Added my codes below for your convenience of testing. The 'sum_per_group', 'Amount' and 'Total Amount' consist of floating-point numbers with 2 decimal places (Float64
format) whereas 'ID' consists of 6 to 7 digits integers (Int64
format).
import polars as pl
import polars.selectors as sc
df = pl.read_excel('../Documents/abc.xlsx', engine='openpyxl')
df = df.with_columns(sum_per_group = pl.col('Amount').sum().over('ID').sort_by('ID'))
df = df.with_columns(sc.by_dtype(pl.Float64).round(2))
df = df.with_columns([
pl.when(pl.col('sum_per_group') != pl.col('Total Amount'))
.then(pl.lit('No'))
.otherwise(pl.lit('Yes'))
.alias('Flag')
])
df.write_excel('abc_v2.xlsx')
---EDIT2--- Added a picture of my OS regional settings
Some suggestions for you:
df = df.with_columns(sc.by_dtype(pl.Float64).round(2))
, it can be sign that Excel perhaps receives the text not a number and is not able to convert this text to the number. Check it for sure and correct if necessary.