excelexport-to-excelpython-polars

Truncate string lengths in Polars so they don't exceed Excel character limit


I am currently able to truncate an individual column's str values to below the excel character limit (failing to do so means the following cells are blank using xlsxwriter), but I'm looking to do so across every/any column that meets that condition - potentially something using pl.all().

So far, this is what I have to apply to each column that has values exceeding the limit:

df.with_columns(pl.when(pl.col('col1').str.len_chars() >= 32_767)
                  .then(pl.lit('Too many to display - ') + pl.col('col1').str.slice(0, 10_000))
                  .otherwise(pl.col('col1'))
                )

Solution

  • pl.all() should work if you use .name.keep()

    df = pl.from_repr("""
    ┌─────────┬─────────┬────────┬──────┐
    │ col1    ┆ col2    ┆ col3   ┆ col4 │
    │ ---     ┆ ---     ┆ ---    ┆ ---  │
    │ str     ┆ str     ┆ str    ┆ i64  │
    ╞═════════╪═════════╪════════╪══════╡
    │ aaaaaaa ┆ b       ┆ c      ┆ 1    │
    │ a       ┆ bb      ┆ cccccc ┆ 2    │
    │ aa      ┆ bbbbbbb ┆ cccc   ┆ 3    │
    └─────────┴─────────┴────────┴──────┘
    """)
    

    You can use pl.col(pl.String) to select just string columns (if there are other types):

    df.with_columns(
       pl.when(pl.col(pl.String).str.len_chars() >= 3)
         .then('[*] ' + pl.col(pl.String).str.slice(0, 3))
         .otherwise(pl.col(pl.String))
         .name.keep()
    )
    
    shape: (3, 4)
    ┌─────────┬─────────┬─────────┬──────┐
    │ col1    ┆ col2    ┆ col3    ┆ col4 │
    │ ---     ┆ ---     ┆ ---     ┆ ---  │
    │ str     ┆ str     ┆ str     ┆ i64  │
    ╞═════════╪═════════╪═════════╪══════╡
    │ [*] aaa ┆ b       ┆ c       ┆ 1    │
    │ a       ┆ bb      ┆ [*] ccc ┆ 2    │
    │ aa      ┆ [*] bbb ┆ [*] ccc ┆ 3    │
    └─────────┴─────────┴─────────┴──────┘