Let's say, I have number A
and they call several people B
A B
123 987
123 987
123 124
435 567
435 789
653 876
653 876
999 654
999 654
999 654
999 123
I want to find to whom the person in A
has called maximum times and also the number of times.
OUTPUT:
A B Count
123 987 2
435 567 or 789 1
653 876 2
999 654 3
How one can think of it is,
A B
123 987 2
124 1
435 567 1
789 1
653 876 2
999 654 3
123 1
Can somebody help me out on how to do this?
We can get the desired output in 2 steps:
A
(value_counts
with 2 columns)A
(boolean indexing)# count the unique values in rows
# `sort_index` is optional
df.value_counts(['A','B']).sort_index()
A B
123 124 1
987 2
435 567 1
789 1
653 876 2
999 123 1
654 3
dtype: int64
To get the highest values for each unique A:
v = df.value_counts(['A','B'])
agg_df = v[~v.reset_index(level=0).duplicated('A').values]
You can also replace the second part with groupby.agg
that gets the max of the count and its corresponding B
value. Then the whole thing can be written in one line:
agg_df = (
df.value_counts(['A','B'])
.reset_index()
.groupby('A').agg({'B': 'first', 'count': 'max'})
)
A B
999 654 3
123 987 2
653 876 2
435 567 1
dtype: int64