pandaspython-polarspandas-profilingrust-polars

Is there a Pandas Profiling like implemention built on polars?


We use Pandas and Pandas Profiling extensively in our projects to generate profile reports. We were going to explore using Polars as a Pandas alternative and wanted to check if there were any implementations like Pandas Profiling built on top of Polars?

I have searched a bit before posting this question and did not find any similar implementations. So, wanted to check if anyone else had an idea about the same?


Solution

  • I'm not aware of any project implemented natively with Polars. That said, there's an easy way to use Pandas Profiling with Polars.

    From the Other DataFrame libraries page of the Pandas Profiling documentation:

    If you have data in another framework of the Python Data ecosystem, you can use pandas-profiling by converting to a pandas DataFrame, as direct integrations are not yet supported.

    On the above page, you'll see suggestions for using Pandas Profiling with other dataframe libraries, such as Modin, Vaex, PySpark, and Dask.

    We can do the same thing easily with Polars, using the to_pandas method.

    Adapting an example from the Quick Start Guide to use Polars:

    import polars as pl
    import numpy as np
    from pandas_profiling import ProfileReport
    
    df = pl.DataFrame(np.random.rand(100, 5), columns=["a", "b", "c", "d", "e"])
    
    profile = ProfileReport(df.to_pandas(), title="Pandas Profiling Report")
    
    profile.to_file('your_report.html')
    

    In general, you're always one method call away from plugging Polars into any framework that uses Pandas. I myself use to_pandas so that I can use Polars with my favorite graphing library, plotnine.

    (As an aside, thank you for sharing the Pandas Profiling project here. I was quite impressed with the output generated, and will probably use it on projects going forward.)