let's say i have and dataframe as below
date,ent_id,val
2021-03-23,101,61
2021-03-12,103,64
2021-03-15,101,32
2021-04-01,103,39
2021-04-02,101,71
2021-04-02,103,79
2021-04-30,101,51
2021-04-30,103,53
2021-05-31,101,28
2021-05-31,103,26
2021-05-31,101,47
2021-05-31,103,61
2021-06-06,101,45
2021-06-06,103,78
2021-06-07,101,23
2021-06-07,103,31
2021-07-31,101,14
2021-07-31,103,02
2021-07-31,101,82
2021-07-31,103,15
i want to create an addition column in dataframe which contain month end date based of following condition
case
when DAYNAME('date')='Sunday' then days_add(date,-2)
when DAYNAME('date')='Saturday' then days_add(date,-1)
else date
so the output would be like this
date,ent_id,val,month_end
2021-03-23,101,61,2021-03-31
2021-03-12,103,64,2021-03-31
2021-03-15,101,32,2021-03-31
2021-04-01,103,39,2021-04-30
2021-04-02,101,71,2021-04-30
2021-04-02,103,79,2021-04-30
2021-04-30,101,51,2021-04-30
2021-04-30,103,53,2021-04-30
2021-05-31,101,28,2021-05-31
2021-05-31,103,26,2021-05-31
2021-05-31,101,47,2021-05-31
2021-05-31,103,61,2021-05-31
2021-06-06,101,45,2021-06-30
2021-06-06,103,78,2021-06-30
2021-06-07,101,23,2021-06-30
2021-06-07,103,31,2021-06-30
2021-07-31,101,14,2021-07-31
2021-07-31,103,02,2021-07-31
2021-07-31,101,82,2021-07-31
2021-07-31,103,15,2021-07-31
my effort
import pandas as pd
from datetime import timedelta
from pandas.tseries.offsets import MonthEnd
import numpy as np
df.loc[(df['date']+MonthEnd(0)).dt.day_name()=='Sunday','month_end'] =[df.loc[(df['date']+MonthEnd(0)).dt.day_name()=='Sunday']['date']+timedelta(days=-2)]
df.loc[(df['date']+MonthEnd(0)).dt.day_name()=='Saturday','month_end'] =[df.loc[(df['date']+MonthEnd(0)).dt.day_name()=='Saturday']['date']+timedelta(days=-1)]
but getting this error
ValueError: Must have equal len keys and value when setting with an ndarray
any other better solution is most welcome
You can use pd.offsets.MonthEnd(n=0)
and then adjust for Sunday and Saturday. Note that n=0
is needed or else 2021-07-31 will be rolled forward to 2021-08-31.
df['month_end'] = df['date'] + pd.offsets.MonthEnd(n=0)
df.loc[df['month_end'].dt.day_name() == 'Sunday', 'month_end'] -= pd.DateOffset(2)
df.loc[df['month_end'].dt.day_name() == 'Saturday', 'month_end'] -= pd.DateOffset(1)
print(df)
date ent_id val month_end
0 2021-03-23 101 61 2021-03-31
1 2021-03-12 103 64 2021-03-31
2 2021-03-15 101 32 2021-03-31
3 2021-04-01 103 39 2021-04-30
4 2021-04-02 101 71 2021-04-30
5 2021-04-02 103 79 2021-04-30
6 2021-04-30 101 51 2021-04-30
7 2021-04-30 103 53 2021-04-30
8 2021-05-31 101 28 2021-05-31
9 2021-05-31 103 26 2021-05-31
10 2021-05-31 101 47 2021-05-31
11 2021-05-31 103 61 2021-05-31
12 2021-06-06 101 45 2021-06-30
13 2021-06-06 103 78 2021-06-30
14 2021-06-07 101 23 2021-06-30
15 2021-06-07 103 31 2021-06-30
16 2021-07-31 101 14 2021-07-30
17 2021-07-31 103 2 2021-07-30
18 2021-07-31 101 82 2021-07-30
19 2021-07-31 103 15 2021-07-30
Note that although you seems to want to get the last business date of a month, we cannot simply use pd.offsets.BMonthEnd(n=0)
since 2021-07-31 will still be rolled forward to 2021-08-31.
When I first looked at the question, what immediately popped up to my mind is that probably we can make use of pd.offsets.BMonthEnd
to solve the question.
As I also know that for pd.offsets.MonthEnd
, we can use parameter n=0
(or simply 0
) to ensure the date is not rolled if on an anchor point (on an anchor point if the date is already on month-end date in this case). Hence, my initial trial is to use pd.offsets.BMonthEnd(n=0)
. To my surprise, it doesn't behave similarly as the MonthEnd counterpart. Date of 2021-07-31
is still rolled forward to 2021-08-31
.
As this kind of problem is very common, I would like to provide a workaround to enable us to still use BMonthEnd
instead of providing codes to check and modify for Sunday and Saturday.
Here's the workaround code to enable BMonthEnd
to behave similarly as MonthEnd(n=0)
:
df['business_month_end'] = df['date'] + pd.offsets.MonthEnd(0) - pd.offsets.MonthBegin() + pd.offsets.BMonthEnd()
Here, MonthEnd(0)
is the same as MonthEnd(n=0)
while MonthEnd()
and BMonthEnd()
without passing parameter is the same as passing n=1
(the default).
The mechanism is that we borrow the feature of MonthEnd(n=0)
to keep on anchor even on anchor point and get the month-begin of that date (should be the first date within the same month) and then apply the BMonthEnd
function to let it gets us the last business date of the same month (with adjustments if dates fall on Sunday & Saturday).
Pandas provides a solution using BMonthEnd in the docs.
If you want to get the end of the current business month:
>>> ts = pd.Timestamp(2022, 11, 30)
>>> pd.offsets.BMonthEnd().rollforward(ts)
Timestamp('2022-11-30 00:00:00')