I have a dataframe that looks like this.
I would like to add a column "Tile" that looks like this
is numpy.tile()
the correct tool for this?
Ultimately what I am trying to do is count the month gaps each person has between the CRP event. So up until there is an event, the event is null. That would be a zero. When the next event happens, the tile would be 1. then the next few events if null would be a 1 until another event happens. if there is another CRP in consecutive order, that would trigger a new tile i.e. 3 if the previous was 2.
EDIT: The tile is supposed to start over per member. I forgot that important detail. So I will try to use the cumsum() with a window.
you can compare column with 'CRP
to get True
/False
and later use .cumsum()
which will treat True
as 1
and False
as 0
Minimal working code:
import pandas as pd
df = pd.DataFrame({
'event':[None, None, 'CRP', None, None, None, 'CRP', 'CRP', None, None, None, None]
})
print(df)
df['tile'] = (df['event'] == 'CRP').cumsum()
print(df)
Result:
event tile
0 None 0
1 None 0
2 CRP 1
3 None 1
4 None 1
5 None 1
6 CRP 2
7 CRP 3
8 None 3
9 None 3
10 None 3
11 None 3
Other idea: because you have only CRP
and Null
so you can use exanding(1)
which will get rows similar to cumsum()
(and other cumulative functions) and use .count()
which will count not-null values
df['tile'] = df['event'].expanding(1).count().astype(int)
UPDATE: (after adding details in question)
If you want it for every member then you can use groupby()
df.groupby('member')['event'].expanding(1).count().astype(int).values
or
df.groupby('member')['event'].expanding(1).count().astype(int).reset_index(drop=True)
Minimal working code used for tests:
import pandas as pd
df = pd.DataFrame({
'member': ["A"]*12 + ["B"]*12,
'event': [None, None, 'CRT', None, None, None, 'CRT', 'CRT', None, None, None, None]*2,
})
print(df)
# --- old version ---
#df['tile'] = (df['event'] == 'CRT').cumsum()
#print(df)
#df['tile2'] = df['event'].expanding(1).count().astype(int)
#print(df)
# --- grouping by memeber ---
df['tile'] = (df['event'] == 'CRT') # create column with True/False
df['tile'] = df.groupby('member')['tile'].cumsum() # count True/False in this column
print(df)
df['tile2'] = df.groupby('member')['event'].expanding(1).count().astype(int).values
print(df)
df['tile3'] = df.groupby('member')['event'].expanding(1).count().astype(int).reset_index(drop=True)
print(df)
Result:
member event tile
0 A None 0
1 A None 0
2 A CRT 1
3 A None 1
4 A None 1
5 A None 1
6 A CRT 2
7 A CRT 3
8 A None 3
9 A None 3
10 A None 3
11 A None 3
12 B None 0
13 B None 0
14 B CRT 1
15 B None 1
16 B None 1
17 B None 1
18 B CRT 2
19 B CRT 3
20 B None 3
21 B None 3
22 B None 3
23 B None 3