I have the following dataframe:
state_territory_code week_nbr
CA WEEK 49
CA WEEK 49
FL WEEK 49
CA WEEK 50
TN WEEK 50
HI WEEK 50
GA WEEK 51
FL WEEK 51
I am trying to essentially perform the following:
The output should be:
state_territory_code week_nbr new_cal_date
CA WEEK 49 01/01/2024
CA WEEK 49 01/01/2024
FL WEEK 49 01/01/2024
CA WEEK 50 01/08/2024
TN WEEK 50 01/08/2024
HI WEEK 50 01/08/2024
GA WEEK 51 01/15/2024
FL WEEK 51 01/15/2024
Here is my code:
def cal_week_start(some_df):
start_dt = datetime.datetime(2024, 1, 1)
for i, wk in enumerate(some_df):
if i < (len(some_df)-1): ## Keep running as long as we haven't reached end of DF
next_value = some_df[i+1] ## index pos of next value
if wk != next_value: ## if value changes from one to the next do something
new_dt = start_dt + timedelta(days = 7) ## Add 7 days.
start_dt = new_dt ## reinitiate variable to add 7 days, 14, 21, etc.
return start_dt
else:
return start_dt
df_new['new_cal_date'] = df_new['week_nbr'].apply(cal_week_start)
It's a little wonky I know/not fool-proof, but I just need it to execute on a large dataset. It works when I test it outside of the function for some reason, not sure why it just repeats 01/08/2024 all the way through. Any help would be greatly appreciated. I am still learning.
I would slightly adjust your assumptions:
>>> import datetime
>>> start_dt = datetime.datetime(2024, 1, 1)
>>> min_week = min(df.week_nbr.str.rsplit(" ").str[-1].astype(int))
>>> df["new_cal_date"] = datetime.timedelta(days=7)*df.week_nbr.str.rsplit(" ").str[-1].astype(int).sub(min_week) + start_dt
>>> df
state_territory_code week_nbr new_cal_date
0 CA WEEK 49 2024-01-01
1 CA WEEK 49 2024-01-01
2 FL WEEK 49 2024-01-01
3 CA WEEK 50 2024-01-08
4 TN WEEK 50 2024-01-08
5 HI WEEK 50 2024-01-08
6 GA WEEK 51 2024-01-15
7 FL WEEK 51 2024-01-15
Explanation: you probably mean to map week, per its number to date - not rely on order (even if it's given).
You also will be better off using vectorized computing, apply
is usually slower.
The only caveat here is, if year marker changes - but you can easily mitigate it with grouping/conditional mapping.