I have a set of data and want to get its value_counts result:
df = pd.DataFrame(
[5.01, 5.01, 5.08, 6.1, 5.54, 6.3, 5.56, 5.55, 6.7],
columns=['val'])
>>> df
val
0 5.01
1 5.01
2 5.08
3 6.10
4 5.54
5 6.30
6 5.56
7 5.55
8 6.70
>>> df.val.value_counts()
5.01 2
5.08 1
6.10 1
5.54 1
6.30 1
5.56 1
5.55 1
6.70 1
Name: val, dtype: int64
Is there a way to allow a certain tolerance when using value_counts, such as plus or minus 0.01, so that 5.54, 5.55, and 5.56 in the series are calculated as a group? The result I hope is:
[5.54,5.56,5.55] 3
[5.01] 2
[5.08] 1
[6.10] 1
...
try this:
mask = df['val'].sort_values().diff().gt(0.01)
result_df = df.groupby(mask.cumsum())['val'].agg([set, 'count'])
print(result_df)
set count
val
0 {5.01} 2
1 {5.08} 1
2 {5.55, 5.54, 5.56} 3
3 {6.1} 1
4 {6.3} 1
5 {6.7} 1