pythonplotly-pythonpython-polars

How to use Polars with Plotly without converting to Pandas?


I would like to replace Pandas with Polars but I was not able to find out how to use Polars with Plotly without converting to Pandas. I wonder if there is a way to completely cut Pandas out of the process.

Consider the following test data:

import polars as pl
import numpy as np
import plotly.express as px

df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names": ["foo", "ham", "spam", "egg", None],
        "random": np.random.rand(5),
        "groups": ["A", "A", "B", "C", "B"],
    }
)

fig = px.bar(df, x='names', y='random')
fig.show()

I would like this code to show the bar chart in a Jupyter notebook but instead it returns an error:

/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/polars/internals/frame.py:1483: UserWarning: accessing series as Attribute of a DataFrame is deprecated
  warnings.warn("accessing series as Attribute of a DataFrame is deprecated")

It is possible to transform the Polars data frame to a Pandas data frame with df = df.to_pandas(). Then, it works. However, is there another, simpler and more elegant solution?


Solution

  • Yes, no need for converting to a Pandas dataframe. Someone (sa-) has requested supporting a better option here and included a workaround for it.

    "The workaround that I use right now is px.line(x=df["a"], y=df["b"]), but it gets unwieldy if the name of the data frame is too big"

    For the OP's code example, the approach of specifying the dataframe columns explicitly works.
    I find in addition to specifying the dataframe columns with px.bar(x=df["names"], y=df["random"]) - or - px.bar(df, x=df["names"], y=df["random"]), casting to a list can also work:

    import polars as pl
    import numpy as np
    import plotly.express as px
    
    df = pl.DataFrame(
        {
            "nrs": [1, 2, 3, None, 5],
            "names": ["foo", "ham", "spam", "egg", None],
            "random": np.random.rand(5),
            "groups": ["A", "A", "B", "C", "B"],
        }
    )
    
    px.bar(df, x=list(df["names"]), y=list(df["random"]))
    

    Knowing polars better, you may see some other options once you see the idea of the workaround.

    The example posted there is simpler, instead of px.line(df, x="a", y="b") like you could use for a Pandas dataframe, you use px.line(x=df["a"], y=df["b"]). With polars, that is:

    import polars as pl
    import plotly.express as px
    
    df = pl.DataFrame({"a":[1,2,3,4,5], "b":[1,4,9,16,25]})
    
    px.line(x=df["a"], y=df["b"])
    

    (Note that using plotly.express requires Pandas to be installed, see here and here. I used plotly.express in my answer because it was closer to the OP. The code could be adapted to using plotly.graph_objects if there was a desire to not have Pandas installed & involved at all.)