pythonpython-polars

How to add a new level to JSON output using Polars in Python?


I'm using Polars to process a DataFrame so I can save it as JSON. I know I can use the method .write_json(), however, I would like to add a new level to the JSON.

My current approach:

import polars as pl


df = pl.DataFrame({
    "id": [1, 2, 3, 4, 5],
    "variable1": [15, 25, 5, 10, 20],
    "variable2": [40, 30, 50, 10, 20],
}) 
(
    df.write_json()
)

Current output:

'[{"id":1,"variable1":15,"variable2":40},{"id":2,"variable1":25,"variable2":30},{"id":3,"variable1":5,"variable2":50},{"id":4,"variable1":10,"variable2":10},{"id":5,"variable1":20,"variable2":20}]'

But I would like to save it in this way, with the "Befs" key, so each "Befs" contains every record of the DataFrame.

Desired output:

{
    "Befs": [
        {
            "ID ": 1,
            "variable1": 15,
            "variable2": 40
        },
        {
            "ID ": 2,
            "variable1": 25,
            "variable2": 30
        }

    ]
}

I have tried using .pl.struct() , but my attemps make no sense:

(
    df
    .select(
        pl.struct(
            pl.lit("Bef").alias("Bef"),
            pl.col("id"),
            pl.col("variable1"),
            pl.col("variable2")
        )
    )
    .write_json()
)

Solution

  • The write_json() function always returns the data in a row-oriented format, in which the root element is a list, and each row contains a mapping of column_name -> row_value

    As a hacky workaround you could use write_ndjson() instead, given that its root element is a dictionary (for each line), but for that to match your desired output you'll have to implode everything into a single row and wrap it around a struct.

    df.select(Bef=pl.struct(pl.all()).implode()).write_ndjson())