visual-studio-codejupyterpython-polars

How to properly display a Polars dataframe in VSCode Jupyter Notebook variables inspector


Edit 2 (01.08.2024):
I believe VSCode has now moved onto the DataWrangler extension as their default data inspector and will deprecate the default one.
https://marketplace.visualstudio.com/items?itemName=ms-toolsai.datawrangler


Edit: This has been filed as bug in the Polars repository: https://github.com/pola-rs/polars/issues/6152
And the VSCode Jupyter repo: https://github.com/microsoft/vscode-jupyter/issues/12519


I am testing Python-Polars inside a Jupyter notebook in VSCode.

When I open a data frame from the variable view, it is not formatted correctly.

It shows like this: enter image description here

Columns and Rows are swapped and the column names are missing.

I would've expected a display similar to pandas data frames like so: enter image description here

How can I make the Polars dataframe display correctly?


Solution

  • Update (2023-08-23): the latest release of VSCode will call to_pandas automatically and you no longer need the alias.


    VSCode will try to display variables with a type name DataFrame in the data viewer. It does not check the fully qualified name and will try to treat a polars.DataFrame the same way as a pandas.DataFrame.

    See: https://github.com/microsoft/vscode-jupyter/blob/main/pythonFiles/vscode_datascience_helpers/getJupyterVariableDataFrameInfo.py

    It will try to call a method named toPandas on a DataFrame if it exists (which it does not in the case of polars).

    Either VSCode gets proper support for polars or polars would have to implement the toPandas method.

    However, since polars already has a method to_pandas you could create an alias for that and it will display as expected.

    import polars as pl
    df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
    df.toPandas = df.to_pandas
    

    Screenshot of data viewer showing correct column names of a pl.DataFrame