[SOLVED] Using numpy tile to group rows by column in a python data frame

Using numpy tile to group rows by column in a python data frame

I have a dataframe that looks like this.
enter image description here

I would like to add a column "Tile" that looks like this
enter image description here

is numpy.tile() the correct tool for this?

Ultimately what I am trying to do is count the month gaps each person has between the CRP event. So up until there is an event, the event is null. That would be a zero. When the next event happens, the tile would be 1. then the next few events if null would be a 1 until another event happens. if there is another CRP in consecutive order, that would trigger a new tile i.e. 3 if the previous was 2.

EDIT: The tile is supposed to start over per member. I forgot that important detail. So I will try to use the cumsum() with a window.

Solution

you can compare column with 'CRP to get True/False and later use .cumsum()
which will treat True as 1 and False as 0

Minimal working code:

import pandas as pd

df = pd.DataFrame({
   'event':[None, None, 'CRP', None, None, None, 'CRP', 'CRP', None, None, None, None]
})
print(df)

df['tile'] = (df['event'] == 'CRP').cumsum()

print(df)

Result:

   event  tile
0   None     0
1   None     0
2    CRP     1
3   None     1
4   None     1
5   None     1
6    CRP     2
7    CRP     3
8   None     3
9   None     3
10  None     3
11  None     3

Other idea: because you have only CRP and Null so you can use exanding(1) which will get rows similar to cumsum() (and other cumulative functions) and use .count() which will count not-null values

df['tile'] = df['event'].expanding(1).count().astype(int)

UPDATE: (after adding details in question)

If you want it for every member then you can use groupby()

df.groupby('member')['event'].expanding(1).count().astype(int).values

df.groupby('member')['event'].expanding(1).count().astype(int).reset_index(drop=True)

Minimal working code used for tests:

import pandas as pd

df = pd.DataFrame({
    'member': ["A"]*12 + ["B"]*12,
    'event': [None, None, 'CRT', None, None, None, 'CRT', 'CRT', None, None, None, None]*2,
})
print(df)

# --- old version ---

#df['tile'] = (df['event'] == 'CRT').cumsum()
#print(df)

#df['tile2'] = df['event'].expanding(1).count().astype(int)
#print(df)

# --- grouping by memeber ---

df['tile'] = (df['event'] == 'CRT') # create column with True/False
df['tile'] = df.groupby('member')['tile'].cumsum()  # count True/False in this column
print(df)

df['tile2'] = df.groupby('member')['event'].expanding(1).count().astype(int).values
print(df)

df['tile3'] = df.groupby('member')['event'].expanding(1).count().astype(int).reset_index(drop=True)
print(df)

Result:

   member event  tile
0       A  None     0
1       A  None     0
2       A   CRT     1
3       A  None     1
4       A  None     1
5       A  None     1
6       A   CRT     2
7       A   CRT     3
8       A  None     3
9       A  None     3
10      A  None     3
11      A  None     3
12      B  None     0
13      B  None     0
14      B   CRT     1
15      B  None     1
16      B  None     1
17      B  None     1
18      B   CRT     2
19      B   CRT     3
20      B  None     3
21      B  None     3
22      B  None     3
23      B  None     3