I would like to replace Pandas with Polars but I was not able to find out how to use Polars with Plotly without converting to Pandas. I wonder if there is a way to completely cut Pandas out of the process.
Consider the following test data:
import polars as pl
import numpy as np
import plotly.express as px
df = pl.DataFrame(
{
"nrs": [1, 2, 3, None, 5],
"names": ["foo", "ham", "spam", "egg", None],
"random": np.random.rand(5),
"groups": ["A", "A", "B", "C", "B"],
}
)
fig = px.bar(df, x='names', y='random')
fig.show()
I would like this code to show the bar chart in a Jupyter notebook but instead it returns an error:
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/polars/internals/frame.py:1483: UserWarning: accessing series as Attribute of a DataFrame is deprecated
warnings.warn("accessing series as Attribute of a DataFrame is deprecated")
It is possible to transform the Polars data frame to a Pandas data frame with df = df.to_pandas()
. Then, it works. However, is there another, simpler and more elegant solution?
Yes, no need for converting to a Pandas dataframe. Someone (sa-) has requested supporting a better option here and included a workaround for it.
"The workaround that I use right now is px.line(x=df["a"], y=df["b"]), but it gets unwieldy if the name of the data frame is too big"
For the OP's code example, the approach of specifying the dataframe columns explicitly works.
I find in addition to specifying the dataframe columns with px.bar(x=df["names"], y=df["random"])
- or - px.bar(df, x=df["names"], y=df["random"])
, casting to a list can also work:
import polars as pl
import numpy as np
import plotly.express as px
df = pl.DataFrame(
{
"nrs": [1, 2, 3, None, 5],
"names": ["foo", "ham", "spam", "egg", None],
"random": np.random.rand(5),
"groups": ["A", "A", "B", "C", "B"],
}
)
px.bar(df, x=list(df["names"]), y=list(df["random"]))
Knowing polars better, you may see some other options once you see the idea of the workaround.
The example posted there is simpler, instead of px.line(df, x="a", y="b")
like you could use for a Pandas dataframe, you use px.line(x=df["a"], y=df["b"])
. With polars, that is:
import polars as pl
import plotly.express as px
df = pl.DataFrame({"a":[1,2,3,4,5], "b":[1,4,9,16,25]})
px.line(x=df["a"], y=df["b"])
(Note that using plotly.express
requires Pandas to be installed, see here and here. I used plotly.express
in my answer because it was closer to the OP. The code could be adapted to using plotly.graph_objects
if there was a desire to not have Pandas installed & involved at all.)