In each group, each 1st unique item should be given a different number in new column 'num'.
I can form the groups but I don't know how to number the unique pieces.
Is there a way to do that ?
Unique numbers are:
AF=1 / CT=2 / RT=4 / CTS=4
data = {'ATEXT': ['AF', 'AF', '', '', 'CT', 'RT', '', 'AF', 'AF', 'CTS', 'AF', 'AF', 'AF', 'CT', 'AF', 'CT', 'AF', 'AF', 'AF', 'AF', 'RT', 'RT', '', '', 'AF', 'CT', 'CT', 'RT', 'AF', 'AF', 'CT']}
df = pd.DataFrame(data)
df
End Result should be:
Out[6]:
ATEXT num
0 AF 1
1 AF
2
3
4 CT 2
5 RT 3
6
7 AF 1
8 AF
9 CTS 4
10 AF
11 AF
12 AF
13 CT 2
14 AF
15 CT
16 AF
17 AF
18 AF
19 AF
20 RT 3
21 RT
22
23
24 AF 1
25 CT 2
26 CT
27 RT 3
28 AF
29 AF
30 CT
my idea (does not yet give a useful result) :
nl = df['ATEXT']!=("")
df['num'] = (df['ATEXT'].mask(nl)
.groupby(nl)
.where(~df.ATEXT.duplicated()
)
IIUC, I using a few intermediate columns to help with logic. Instead of using factorize
you could use map
to assign your unique numbers.
Try:
df['CODE'] = df['ATEXT'].mask(df['ATEXT'] == '').factorize()[0] + 1
df.loc[df['ATEXT'] == '', 'CODE'] = np.nan
df['grp'] = df['ATEXT'].eq('').cumsum()
df['Num'] = df.groupby('grp', group_keys=False).apply(lambda x: x.drop_duplicates('ATEXT'))['CODE']
df['Num'] = df['Num'].fillna('')
df[['ATEXT', 'Num']]
Output:
ATEXT Num
0 AF 1.0
1 AF
2
3
4 CT 2.0
5 RT 3.0
6
7 AF 1.0
8 AF
9 CTS 4.0
10 AF
11 AF
12 AF
13 CT 2.0
14 AF
15 CT
16 AF
17 AF
18 AF
19 AF
20 RT 3.0
21 RT
22
23
24 AF 1.0
25 CT 2.0
26 CT
27 RT 3.0
28 AF
29 AF
30 CT
Or, much like you were doing, let use transform
with groupby
to mask duplicates with where
and duplicated
in each group:
df['Num_2'] = df.groupby('grp')['CODE'].transform(lambda x: x.where(~x.duplicated(keep='first')).fillna(''))
df[['ATEXT', 'Num_2']]
Output:
ATEXT Num_2
0 AF 1.0
1 AF
2
3
4 CT 2.0
5 RT 3.0
6
7 AF 1.0
8 AF
9 CTS 4.0
10 AF
11 AF
12 AF
13 CT 2.0
14 AF
15 CT
16 AF
17 AF
18 AF
19 AF
20 RT 3.0
21 RT
22
23
24 AF 1.0
25 CT 2.0
26 CT
27 RT 3.0
28 AF
29 AF
30 CT