juliaijulia-notebook

Using Julia to expand a list of dicts fields in dataframe


I have a column in a dataframe that is a list of dictionaries:

[{"key1": value1, "key2": "value2", "key3": "value3", "key4": "value4"}, {"key1": value1, "key2": "value2", "key3": "value3", "key4": "value4"}]

Is there a way to expand this column to get something like this:

key1    key2     key3      key4
value1  value2   value3    value4
value1  value2   value3    value4

Note: key_ can be any string, value_ can be any value.


Solution

  • This is quite easy:

    julia> df = DataFrame(col=[Dict("key1"=>"value1", "key2"=>"value2", "key3"=>"value3", "key4"=>"value4"),
                               Dict("key1"=>"value1", "key2"=>"value2", "key3"=>"value3", "key4"=>"value4")])
    2×1 DataFrame
     Row │ col
         │ Dict…
    ─────┼───────────────────────────────────
       1 │ Dict("key2"=>"value2", "key3"=>"…
       2 │ Dict("key2"=>"value2", "key3"=>"…
    
    julia> select(df, :col => AsTable)
    2×4 DataFrame
     Row │ key2    key3    key1    key4
         │ String  String  String  String
    ─────┼────────────────────────────────
       1 │ value2  value3  value1  value4
       2 │ value2  value3  value1  value4
    

    The only limitation is that the order of resulting columns is undefined as Dict does not guarantee key order. You would need to re-order them in the second step e.g. like this:

    julia> select(select(df, :col => AsTable), string.("key", 1:4))
    2×4 DataFrame
     Row │ key1    key2    key3    key4
         │ String  String  String  String
    ─────┼────────────────────────────────
       1 │ value1  value2  value3  value4
       2 │ value1  value2  value3  value4
    

    Another approach would be:

    julia> select(df, :col .=> [ByRow(x -> x["key$i"]) => "key$i" for i in 1:4])
    2×4 DataFrame
     Row │ key1    key2    key3    key4
         │ String  String  String  String
    ─────┼────────────────────────────────
       1 │ value1  value2  value3  value4
       2 │ value1  value2  value3  value4
    

    it is a bit more complex conceptually and requires you to know what keys you want to extract, but the benefit is that you do the operation in one shot.