I have the following dataframe df_2, with columns "Date", "Pre_mx" and "Prec_mn". The "Date" column is already in a "Date" format.
Date Pre_mx Pre_mn
1 2011-01-01 24.00 96.40
2 2011-02-01 36.40 109.90
3 2011-03-01 38.00 91.90
4 2011-04-01 22.50 64.00
5 2011-05-01 10.60 18.50
6 2011-06-01 0.00 0.00
7 2011-07-01 0.00 0.00
8 2011-08-01 0.00 0.00
9 2011-09-01 20.50 20.50
10 2011-10-01 22.00 28.00
11 2011-11-01 27.50 64.50
12 2011-12-01 40.00 139.40
13 2012-01-01 55.00 351.70
14 2012-02-01 28.40 99.00
15 2012-03-01 21.10 54.20
16 2012-04-01 30.50 40.50
17 2012-05-01 6.20 11.80
18 2012-06-01 0.00 0.00
19 2012-07-01 0.00 0.00
20 2012-08-01 0.00 0.00
21 2012-09-01 0.00 0.00
22 2012-10-01 63.60 136.10
23 2012-11-01 30.00 90.30
24 2012-12-01 58.30 291.80
25 2013-01-01 28.60 70.30
26 2013-02-01 13.20 58.60
27 2013-03-01 22.00 30.80
28 2013-04-01 32.80 75.10
29 2013-05-01 17.90 52.80
30 2013-06-01 0.00 0.00
31 2013-07-01 0.00 0.00
32 2013-08-01 0.00 0.00
33 2013-09-01 0.40 0.40
34 2013-10-01 79.00 86.40
35 2013-11-01 12.30 15.00
36 2013-12-01 23.10 66.20
37 2014-01-01 24.20 63.50
38 2014-02-01 44.50 77.50
39 2014-03-01 13.20 26.40
40 2014-04-01 2.80 4.80
41 2014-05-01 45.20 63.20
42 2014-06-01 1.50 1.50
43 2014-07-01 0.00 0.00
44 2014-08-01 0.00 0.00
45 2014-09-01 2.50 2.70
46 2014-10-01 5.10 16.50
47 2014-11-01 32.50 74.30
48 2014-12-01 40.00 101.80
49 2015-01-01 66.00 164.80
50 2015-02-01 40.90 126.60
51 2015-03-01 13.00 77.00
52 2015-04-01 8.00 11.70
53 2015-05-01 20.70 37.70
54 2015-06-01 0.00 0.00
55 2015-07-01 0.00 0.00
56 2015-08-01 0.00 0.00
57 2015-09-01 13.70 14.30
58 2015-10-01 24.50 56.70
59 2015-11-01 33.70 44.00
60 2015-12-01 6.40 14.30
61 2016-01-01 23.00 100.20
62 2016-02-01 26.00 47.10
63 2016-03-01 24.50 52.10
64 2016-04-01 12.20 12.20
65 2016-05-01 27.10 41.90
66 2016-06-01 0.00 0.00
67 2016-07-01 0.00 0.00
68 2016-08-01 0.00 0.00
69 2016-09-01 3.10 4.40
70 2016-10-01 19.60 19.60
71 2016-11-01 29.20 43.50
72 2016-12-01 44.50 207.30
73 2017-01-01 41.00 121.70
74 2017-02-01 1.70 4.00
75 2017-03-01 28.00 81.10
76 2017-04-01 4.30 6.90
77 2017-05-01 6.50 14.30
78 2017-06-01 0.00 0.00
79 2017-07-01 0.00 0.00
80 2017-08-01 5.90 5.90
81 2017-09-01 0.00 0.00
82 2017-10-01 47.40 75.00
83 2017-11-01 24.70 71.90
84 2017-12-01 10.00 45.40
85 2018-01-01 53.70 187.60
86 2018-02-01 20.00 50.20
87 2018-03-01 22.00 33.20
88 2018-04-01 0.50 0.50
89 2018-05-01 37.20 47.40
90 2018-06-01 17.00 28.90
91 2018-07-01 0.00 0.00
92 2018-08-01 0.00 0.00
93 2018-09-01 0.80 1.10
94 2018-10-01 16.90 29.30
95 2018-11-01 36.20 69.90
96 2018-12-01 38.80 227.80
97 2019-01-01 39.70 264.30
98 2019-02-01 28.00 121.80
99 2019-03-01 28.00 104.80
100 2019-04-01 8.50 20.40
101 2019-05-01 0.00 0.00
102 2019-06-01 13.70 13.70
103 2019-07-01 0.01 0.01
104 2019-08-01 0.00 0.00
105 2019-09-01 0.00 0.00
106 2019-10-01 24.50 29.60
107 2019-11-01 13.80 33.50
108 2019-12-01 54.00 274.50
109 2020-01-01 99.00 231.30
110 2020-02-01 29.50 76.70
111 2020-03-01 17.20 34.10
112 2020-04-01 10.30 32.51
113 2020-05-01 2.20 3.00
114 2020-06-01 0.00 0.00
115 2020-07-01 0.00 0.00
116 2020-08-01 0.00 0.00
117 2020-09-01 0.00 0.00
118 2020-10-01 0.00 0.00
119 2020-11-01 25.20 86.80
120 2020-12-01 46.60 112.30
What I want to do, is to find the maximum value of column "Pre_mx" for each month, and keep the associated value for Prec_mn. I have used the following code:
df_3 = df_2 %>% group_by( Month =lubridate::month(as.Date(Date), label=TRUE)) %>%
summarise(Pd_max=max(Pre_mx),Pmn_max =max(Pre_mn) )
and got the following result:
What I want to get, though, is the information from the remaining columns as well. For example, the maximum value for January is observed at the 1/1/2020 in df_2, which contains the values Pre_mx = 99 and Pre_mn = 231.30. Hence, I would like something like the following:
Date Pre_mx Pre_mn Month Pd_max Pmn_max
2020-01-01 99.00 231.30 January 99.00 351.70
......
This will include any rows which have the max Pre_mx or max Pre_mn for the month. Those might occur in different rows, or their might be ties, which would all be included.
df |>
mutate(month = lubridate::month(Date, label = TRUE)) |>
filter(Pd_max == max(Pre_mx) | Pmn_max == max(Pre_mn), .by = month)
If you want a single row per month, you could arrange(-Pre_mx, -Pre_mn)
to put the largest Pre_mx first with tiebreaking from Pre_mn (or adjust for whatever tiebreaker logic you want), then use slice()
to get the first row per month.