I receive data as a list of dicts in a single column. Each list can be a different length. Sample data looks like this:
df = pd.DataFrame(
[
[[{'value': 1}, {'value': 2}, {'value': 3}]],
[[{'value': 4}, {'value': 5}]]
],
columns=['data'],
)
df
data
0 [{'value': 1}, {'value': 2}, {'value': 3}]
1 [{'value': 4}, {'value': 5}]
I want to create a new column min_val
which contains the minimum value for each row. I'm trying this:
df.assign(min_val=lambda row: min(val['value'] for val in row.data))
But I get the error:
TypeError: list indices must be integers or slices, not str
A very similar lambda/comprehension combination works in Dask Bag but not in raw Pandas, which is very confusing.
Any help would be very much appreciated.
assign
with a callable argument works on the entire dataframe, not on rows, so you need to then apply
your function to the data
series:
df = df.assign(min_val=df.data.apply(lambda r:min(v['value'] for v in r)))
Output:
data min_val
0 [{'value': 1}, {'value': 2}, {'value': 3}] 1
1 [{'value': 4}, {'value': 5}] 4