I'm using Polars to process a DataFrame so I can save it as JSON. I know I can use the method .write_json()
, however, I would like to add a new level to the JSON.
My current approach:
import polars as pl
df = pl.DataFrame({
"id": [1, 2, 3, 4, 5],
"variable1": [15, 25, 5, 10, 20],
"variable2": [40, 30, 50, 10, 20],
})
(
df.write_json()
)
Current output:
'[{"id":1,"variable1":15,"variable2":40},{"id":2,"variable1":25,"variable2":30},{"id":3,"variable1":5,"variable2":50},{"id":4,"variable1":10,"variable2":10},{"id":5,"variable1":20,"variable2":20}]'
But I would like to save it in this way, with the "Befs" key, so each "Befs" contains every record of the DataFrame.
Desired output:
{
"Befs": [
{
"ID ": 1,
"variable1": 15,
"variable2": 40
},
{
"ID ": 2,
"variable1": 25,
"variable2": 30
}
]
}
I have tried using .pl.struct()
, but my attemps make no sense:
(
df
.select(
pl.struct(
pl.lit("Bef").alias("Bef"),
pl.col("id"),
pl.col("variable1"),
pl.col("variable2")
)
)
.write_json()
)
The write_json()
function always returns the data in a row-oriented format, in which the root element is a list, and each row contains a mapping of column_name -> row_value
As a hacky workaround you could use write_ndjson()
instead, given that its root element is a dictionary (for each line), but for that to match your desired output you'll have to implode everything into a single row and wrap it around a struct.
df.select(Bef=pl.struct(pl.all()).implode()).write_ndjson())