I have a column in a dataframe that is a list of dictionaries:
[{"key1": value1, "key2": "value2", "key3": "value3", "key4": "value4"}, {"key1": value1, "key2": "value2", "key3": "value3", "key4": "value4"}]
Is there a way to expand this column to get something like this:
key1 key2 key3 key4
value1 value2 value3 value4
value1 value2 value3 value4
Note: key_
can be any string, value_
can be any value.
This is quite easy:
julia> df = DataFrame(col=[Dict("key1"=>"value1", "key2"=>"value2", "key3"=>"value3", "key4"=>"value4"),
Dict("key1"=>"value1", "key2"=>"value2", "key3"=>"value3", "key4"=>"value4")])
2×1 DataFrame
Row │ col
│ Dict…
─────┼───────────────────────────────────
1 │ Dict("key2"=>"value2", "key3"=>"…
2 │ Dict("key2"=>"value2", "key3"=>"…
julia> select(df, :col => AsTable)
2×4 DataFrame
Row │ key2 key3 key1 key4
│ String String String String
─────┼────────────────────────────────
1 │ value2 value3 value1 value4
2 │ value2 value3 value1 value4
The only limitation is that the order of resulting columns is undefined as Dict
does not guarantee key order. You would need to re-order them in the second step e.g. like this:
julia> select(select(df, :col => AsTable), string.("key", 1:4))
2×4 DataFrame
Row │ key1 key2 key3 key4
│ String String String String
─────┼────────────────────────────────
1 │ value1 value2 value3 value4
2 │ value1 value2 value3 value4
Another approach would be:
julia> select(df, :col .=> [ByRow(x -> x["key$i"]) => "key$i" for i in 1:4])
2×4 DataFrame
Row │ key1 key2 key3 key4
│ String String String String
─────┼────────────────────────────────
1 │ value1 value2 value3 value4
2 │ value1 value2 value3 value4
it is a bit more complex conceptually and requires you to know what keys you want to extract, but the benefit is that you do the operation in one shot.