I'm writing a script to identify clusters of data in one dimension (depth) defined by a user specified gap in depth for multiple oil wells. The script looks at the difference in depth (d_dep) and then uses a cumsum() to number unique cluster_id. I want to modify the cluster_id to start over at "1" for each new occurrence of a well (id). My "for" loop in the list of IDs is not working. It seems like it should be a simple solution, but I seem to be missing something.
# Create Data
data = {'id': [1,1,1,1,1, 2,2,2,2,2, 3,3,3,3,3],
'depth': [1,2,10,11,70, 0,20,75,150,155, 0,100,105,200,210],
'temp': [1,1,1,1,1, 2,2,2,2,2, 3,3,3,3,3],
'surfT': [9,9,9,9,9,9,9,9,9,9,9,9,9,9,9]}
# Create DataFrame
df = pd.DataFrame(data)
# SORT by id and depth
df.sort_values(by=['id', 'depth'], inplace=True, ascending=True)
# df.head(20)
# Create a list of unique IDs
IDs = df['id'].unique()
print('Unique IDs: ', IDs)
# IDENTIFY DATA CLUSTERS BY DEPTH
#===============================================================================================
#1) Calculate the difference in depth between samples, and the first occurence of id is zero
df['d_dep']= np.abs(np.where(df['id'].shift(1) != df['id'], 0, df['depth'].shift(1) - df['depth']))
#2) Flag clusters by depth using the gap threshold
gap= 50 # Gap Theshold
# Flag clusters by the gap threshold, where the d_dep > gap or the first observation in id
df['cluster'] = np.where((df['d_dep']>=gap) | (df['id'].shift(1) != df['id']), 1, 0)
# Assign unique numeric id for clusters of data based on the gap threshold
for i in IDs:
df['cluster_id'] = df['cluster'].cumsum()
display(df)
I tried:
for i in IDs:
df['cluster_id'] = df['cluster'].cumsum()
And I get the following Table:
The cluster_id for the first id is correct, but for id numbers 2 and 3, the first cluster_id should be 1 and start counting over again.
If I understand what you require you can use:
df['cluster_id'] = df.groupby('id')['cluster'].transform('cumsum')
which gives:
id depth temp surfT d_dep cluster cluster_id
0 1 1 1 9 0.0 1 1
1 1 2 1 9 1.0 0 1
2 1 10 1 9 8.0 0 1
3 1 11 1 9 1.0 0 1
4 1 70 1 9 59.0 1 2
5 2 0 2 9 0.0 1 1
6 2 20 2 9 20.0 0 1
7 2 75 2 9 55.0 1 2
8 2 150 2 9 75.0 1 3
9 2 155 2 9 5.0 0 3
10 3 0 3 9 0.0 1 1
11 3 100 3 9 100.0 1 2
12 3 105 3 9 5.0 0 2
13 3 200 3 9 95.0 1 3
14 3 210 3 9 10.0 0 3