Every customer is duplicated when they have more than one plan. I want to set the status to the customer:
If they have every product with 'canceled_at' filled, the customer status is cancelled, but when it's not every product with the canceled_at filled, but at least one, the status is 'downgrade' because he lost a product.
customer|canceled_at|status
x |3/27/2018 |
x | |
y |2/2/2018 |
y |2/2/2018 |
z |1/1/2018 |
a | |
I already have the canceled status, now i only need the downgrade
df['status']=(df.groupby('customer')['canceled_at'].
transform(lambda x: x.notna().all()).map({True:'canceled'})).fillna(df.status)
customer|canceled_at|status
x |3/27/2018 |downgrade
x | |downgrade
y |2/2/2018 |canceled
y |2/2/2018 |canceled
z |1/1/2018 |canceled
a | |
Here is possible compare column for no missing values and grouping by Series
customer
with GroupBy.transform
and GroupBy.all
,
GroupBy.any
for test all values True
s (all non missing) or at least one value not missing (any non missing) and pass it to numpy.select
:
g = df['canceled_at'].notna().groupby(df['customer'])
m1 = g.transform('all')
m2 = g.transform('any')
df['status'] = np.select([m1, m2],['canceled','downgrade'], np.nan)
print (df)
customer canceled_at status
0 x 3/27/2018 downgrade
1 x NaN downgrade
2 y 2/2/2018 canceled
3 y 2/2/2018 canceled
4 z 1/1/2018 canceled
5 a NaN nan
Or:
df['status'] = np.select([m1, m2],['canceled','downgrade'], '')
print (df)
customer canceled_at status
0 x 3/27/2018 downgrade
1 x NaN downgrade
2 y 2/2/2018 canceled
3 y 2/2/2018 canceled
4 z 1/1/2018 canceled
5 a NaN
If only NaN
s groups need convert to downgrade
:
mask = df['canceled_at'].notna().groupby(df['customer']).transform('all')
df['status'] = np.where(mask,'canceled','downgrade')
print (df)
customer canceled_at status
0 x 3/27/2018 downgrade
1 x NaN downgrade
2 y 2/2/2018 canceled
3 y 2/2/2018 canceled
4 z 1/1/2018 canceled
5 a NaN downgrade