I am trying to solve a problem in my data-frame
df.head()
0 Key value
1 10 500
2 11 500
3 12 600
4 12 800
5 13 1000
6 13 1200
.
.
.
200++
output is to put the values in the above data-frame or have another data-frame with all the values of above with additional info show as below.
Expected Output:
0 Key value
1 10 500
2 11 500
3 12 600
4 12 800
5 12 1400 -----> Addition of 800+600 as keys are same
6 13 1000
7 13 1200
8 13 2200 -----> Addition of 1000+12000 as keys are same
.
.
200++
I am just starting out in Pandas, any help will be appreciate.
A possible solution, which takes the following steps:
First, groupby
is used to organize df
by Key
, and then filter
is applied with a lambda function that selects only groups with more than one row, ensuring sums are computed only for repeated Key
values.
Next, this filtered group is re-aggregated with groupby
to calculate the sum
of value
within each group of Key
values using sum
. The concatenated dataframe is then organized by Key
and value
columns through sort_values
, with ignore_index=True
to reset indexing.
(pd.concat([
df,
df.groupby('Key').filter(lambda x: len(x) > 1)
.groupby('Key', as_index=False)['value'].sum()])
.sort_values('Key', ignore_index=True))
Another possible solution:
pd.concat(
[df,
df.groupby('Key', as_index=False)['value']
.apply(lambda x: None if len(x) == 1 else sum(x))
.dropna()]).sort_values('Key', ignore_index=True)
Output:
Key value
0 10 500
1 11 500
2 12 600
3 12 800
4 12 1400
5 13 1000
6 13 1200
7 13 2200