[SOLVED] Pandas Assign, Lambda, List Comprehension Question

Pandas Assign, Lambda, List Comprehension Question

I receive data as a list of dicts in a single column. Each list can be a different length. Sample data looks like this:

df = pd.DataFrame(
    [
        [[{'value': 1}, {'value': 2}, {'value': 3}]],
        [[{'value': 4}, {'value': 5}]]
    ],
    columns=['data'],
)

df
                                          data
0   [{'value': 1}, {'value': 2}, {'value': 3}]
1   [{'value': 4}, {'value': 5}]

I want to create a new column min_val which contains the minimum value for each row. I'm trying this:

df.assign(min_val=lambda row: min(val['value'] for val in row.data))

But I get the error:

TypeError: list indices must be integers or slices, not str

A very similar lambda/comprehension combination works in Dask Bag but not in raw Pandas, which is very confusing.

Any help would be very much appreciated.

Solution

assign with a callable argument works on the entire dataframe, not on rows, so you need to then apply your function to the data series:

df = df.assign(min_val=df.data.apply(lambda r:min(v['value'] for v in r)))

Output:

                                         data  min_val
0  [{'value': 1}, {'value': 2}, {'value': 3}]        1
1                [{'value': 4}, {'value': 5}]        4