I am failing to find a solution for trading research.
I have a large dataset of market data running from 1993-2024.
Since calendar days do not match the trading days. It might be implemented by "business days", but who knows if holidays/business days match the trading days. I need to get a trading day of month if certain conditions are met.
Day Open High Low Close Volume trading_day
Date
2010-01-04 Monday 112.370003 113.389999 111.510002 113.330002 118944600 1
2010-01-05 Tuesday 113.260002 113.680000 112.849998 113.629997 111579900 2
2010-01-06 Wednesday 113.519997 113.989998 113.430000 113.709999 116074400 3
2010-01-07 Thursday 113.500000 114.330002 113.180000 114.190002 131091100 4
2010-01-08 Friday 113.889999 114.620003 113.660004 114.570000 126402800 5
2010-01-11 Monday 115.080002 115.129997 114.239998 114.730003 106375700 6
2010-01-12 Tuesday 113.970001 114.209999 113.220001 113.660004 163333500 7
2010-01-13 Wednesday 113.949997 114.940002 113.370003 114.620003 161822000 8
2010-01-14 Thursday 114.489998 115.139999 114.419998 114.930000 115718800 9
2010-01-15 Friday 114.730003 114.839996 113.199997 113.639999 212283100 10
2010-01-19 Tuesday 113.620003 115.129997 113.589996 115.059998 139172700 11
2010-01-20 Wednesday 114.279999 114.449997 112.980003 113.889999 216490200 12
2010-01-21 Thursday 113.919998 114.269997 111.559998 111.699997 344859600 13
2010-01-22 Friday 111.199997 111.739998 109.089996 109.209999 345942400 14
2010-01-25 Monday 110.209999 110.410004 109.410004 109.769997 186937500 15
2010-01-26 Tuesday 109.339996 110.470001 109.040001 109.309998 211168800 16
2010-01-27 Wednesday 109.169998 110.080002 108.330002 109.830002 271863600 17
2010-01-28 Thursday 110.190002 110.250000 107.910004 108.570000 316104000 18
2010-01-29 Friday 109.040001 109.800003 107.220001 107.389999 310677600 19
Above by is an example what I am trying to achieve. I managed that by selecting one month for testing purpose with following code.
df_test['trading_day'] = range(1, len(df_test) + 1)
Of course, there are about 300+ months since 1993 so it would be hell to do it all manually.
I managed to df.groupby
by month with the following code:
df_grouped_monthly = df.groupby(pd.Grouper(freq='M'))
and tried to apply same counter as above by. Sadly, it doesn't work on grouped df. I tried .transform(add(1))
Possible solutions are either get grouped row position - no idea whether row number is inherited from initial df, or it is counted based on groupby.
Or add above by mentioned counter to grouped df. I failed to add extra column to grouped df.
Any suggestions how to achieve above by df on whole dataset based on monthly basis, please?
You can convert the 'Date'
column to datetime and create new column before grouping.
df["Month"] = pd.to_datetime(df.Date, format='%b', errors='coerce').dt.month
Output DateFrame:
Date Day Open High Low Close Volume Month
2010-01-04 Monday 112.370003 113.389999 111.510002 113.330002 118944600 1
2010-01-05 Tuesday 113.260002 113.68 112.849998 113.629997 111579900 1
2010-01-06 Wednesday 113.519997 113.989998 113.43 113.709999 116074400 1
2010-01-07 Thursday 113.5 114.330002 113.18 114.190002 131091100 1
2010-01-08 Friday 113.889999 114.620003 113.660004 114.57 126402800 1
2010-01-11 Monday 115.080002 115.129997 114.239998 114.730003 106375700 1
2010-01-12 Tuesday 113.970001 114.209999 113.220001 113.660004 163333500 1
2010-01-13 Wednesday 113.949997 114.940002 113.370003 114.620003 161822000 1
2010-01-14 Thursday 114.489998 115.139999 114.419998 114.93 115718800 1
2010-01-15 Friday 114.730003 114.839996 113.199997 113.639999 212283100 1
2010-01-19 Tuesday 113.620003 115.129997 113.589996 115.059998 139172700 1
2010-01-20 Wednesday 114.279999 114.449997 112.980003 113.889999 216490200 1
2010-01-21 Thursday 113.919998 114.269997 111.559998 111.699997 344859600 1
2010-01-22 Friday 111.199997 111.739998 109.089996 109.209999 345942400 1
2010-01-25 Monday 110.209999 110.410004 109.410004 109.769997 186937500 1
2010-01-26 Tuesday 109.339996 110.470001 109.040001 109.309998 211168800 1
2010-01-27 Wednesday 109.169998 110.080002 108.330002 109.830002 271863600 1
2010-01-28 Thursday 110.190002 110.25 107.910004 108.57 316104000 1
2010-01-29 Friday 109.040001 109.800003 107.220001 107.389999 310677600 1
2010-02-04 Saturday 109.040001 109.800003 107.220001 107.389999 310677600 2
2010-02-05 Sunday 109.040001 109.800003 107.220001 107.389999 310677600 2
2010-02-06 Monday 109.040001 109.800003 107.220001 107.389999 310677600 2
2010-02-07 Tuesday 109.040001 109.800003 107.220001 107.389999 310677600 2
2010-02-08 Wednesday 109.040001 109.800003 107.220001 107.389999 310677600 2
2010-02-09 Thursday 109.040001 109.800003 107.220001 107.389999 310677600 2
2010-02-10 Friday 109.040001 109.800003 107.220001 107.389999 310677600 2
2010-02-11 Saturday 109.040001 109.800003 107.220001 107.389999 310677600 2
2010-02-12 Sunday 109.040001 109.800003 107.220001 107.389999 310677600 2
2010-02-13 Monday 109.040001 109.800003 107.220001 107.389999 310677600 2
2010-02-14 Tuesday 109.040001 109.800003 107.220001 107.389999 310677600 2
2010-02-15 Wednesday 109.040001 109.800003 107.220001 107.389999 310677600 2
2010-02-16 Thursday 109.040001 109.800003 107.220001 107.389999 310677600 2
2010-02-17 Friday 109.040001 109.800003 107.220001 107.389999 310677600 2
You can use this modified dataframe to find the information you are looking for.
#For instance, adding 'trading_Day' column
df['Trading_day']= df.groupby(['Month'], sort=True).cumcount()+1
Output:
Date Day Month Trading_day
0 2010-01-04 Monday 1 1
1 2010-01-05 Tuesday 1 2
2 2010-01-06 Wednesday 1 3
3 2010-01-07 Thursday 1 4
4 2010-01-08 Friday 1 5
5 2010-01-11 Monday 1 6
6 2010-01-12 Tuesday 1 7
7 2010-01-13 Wednesday 1 8
8 2010-01-14 Thursday 1 9
9 2010-01-15 Friday 1 10
10 2010-01-19 Tuesday 1 11
11 2010-01-20 Wednesday 1 12
12 2010-01-21 Thursday 1 13
13 2010-01-22 Friday 1 14
14 2010-01-25 Monday 1 15
15 2010-01-26 Tuesday 1 16
16 2010-01-27 Wednesday 1 17
17 2010-01-28 Thursday 1 18
18 2010-01-29 Friday 1 19
19 2010-02-04 Saturday 2 1
20 2010-02-05 Sunday 2 2
21 2010-02-06 Monday 2 3
22 2010-02-07 Tuesday 2 4
23 2010-02-08 Wednesday 2 5
24 2010-02-09 Thursday 2 6
25 2010-02-10 Friday 2 7
26 2010-02-11 Saturday 2 8
27 2010-02-12 Sunday 2 9
28 2010-02-13 Monday 2 10
29 2010-02-14 Tuesday 2 11
30 2010-02-15 Wednesday 2 12
31 2010-02-16 Thursday 2 13
32 2010-02-17 Friday 2 14
You can then perform your analysis.
# When is Closing value > Opening Value
grouped_df = df.query("Close >= Open")[['Month', 'trading_day']]
print(grouped_df)
Output:
Month trading_day
0 1 1
1 1 2
2 1 3
3 1 4
4 1 5
7 1 8
8 1 9
10 1 11
16 1 17