rdategroup-by

Find maximum values for each month and keep information from other columns in dataframe in R


I have the following dataframe df_2, with columns "Date", "Pre_mx" and "Prec_mn". The "Date" column is already in a "Date" format.

           Date Pre_mx Pre_mn
1   2011-01-01  24.00  96.40
2   2011-02-01  36.40 109.90
3   2011-03-01  38.00  91.90
4   2011-04-01  22.50  64.00
5   2011-05-01  10.60  18.50
6   2011-06-01   0.00   0.00
7   2011-07-01   0.00   0.00
8   2011-08-01   0.00   0.00
9   2011-09-01  20.50  20.50
10  2011-10-01  22.00  28.00
11  2011-11-01  27.50  64.50
12  2011-12-01  40.00 139.40
13  2012-01-01  55.00 351.70
14  2012-02-01  28.40  99.00
15  2012-03-01  21.10  54.20
16  2012-04-01  30.50  40.50
17  2012-05-01   6.20  11.80
18  2012-06-01   0.00   0.00
19  2012-07-01   0.00   0.00
20  2012-08-01   0.00   0.00
21  2012-09-01   0.00   0.00
22  2012-10-01  63.60 136.10
23  2012-11-01  30.00  90.30
24  2012-12-01  58.30 291.80
25  2013-01-01  28.60  70.30
26  2013-02-01  13.20  58.60
27  2013-03-01  22.00  30.80
28  2013-04-01  32.80  75.10
29  2013-05-01  17.90  52.80
30  2013-06-01   0.00   0.00
31  2013-07-01   0.00   0.00
32  2013-08-01   0.00   0.00
33  2013-09-01   0.40   0.40
34  2013-10-01  79.00  86.40
35  2013-11-01  12.30  15.00
36  2013-12-01  23.10  66.20
37  2014-01-01  24.20  63.50
38  2014-02-01  44.50  77.50
39  2014-03-01  13.20  26.40
40  2014-04-01   2.80   4.80
41  2014-05-01  45.20  63.20
42  2014-06-01   1.50   1.50
43  2014-07-01   0.00   0.00
44  2014-08-01   0.00   0.00
45  2014-09-01   2.50   2.70
46  2014-10-01   5.10  16.50
47  2014-11-01  32.50  74.30
48  2014-12-01  40.00 101.80
49  2015-01-01  66.00 164.80
50  2015-02-01  40.90 126.60
51  2015-03-01  13.00  77.00
52  2015-04-01   8.00  11.70
53  2015-05-01  20.70  37.70
54  2015-06-01   0.00   0.00
55  2015-07-01   0.00   0.00
56  2015-08-01   0.00   0.00
57  2015-09-01  13.70  14.30
58  2015-10-01  24.50  56.70
59  2015-11-01  33.70  44.00
60  2015-12-01   6.40  14.30
61  2016-01-01  23.00 100.20
62  2016-02-01  26.00  47.10
63  2016-03-01  24.50  52.10
64  2016-04-01  12.20  12.20
65  2016-05-01  27.10  41.90
66  2016-06-01   0.00   0.00
67  2016-07-01   0.00   0.00
68  2016-08-01   0.00   0.00
69  2016-09-01   3.10   4.40
70  2016-10-01  19.60  19.60
71  2016-11-01  29.20  43.50
72  2016-12-01  44.50 207.30
73  2017-01-01  41.00 121.70
74  2017-02-01   1.70   4.00
75  2017-03-01  28.00  81.10
76  2017-04-01   4.30   6.90
77  2017-05-01   6.50  14.30
78  2017-06-01   0.00   0.00
79  2017-07-01   0.00   0.00
80  2017-08-01   5.90   5.90
81  2017-09-01   0.00   0.00
82  2017-10-01  47.40  75.00
83  2017-11-01  24.70  71.90
84  2017-12-01  10.00  45.40
85  2018-01-01  53.70 187.60
86  2018-02-01  20.00  50.20
87  2018-03-01  22.00  33.20
88  2018-04-01   0.50   0.50
89  2018-05-01  37.20  47.40
90  2018-06-01  17.00  28.90
91  2018-07-01   0.00   0.00
92  2018-08-01   0.00   0.00
93  2018-09-01   0.80   1.10
94  2018-10-01  16.90  29.30
95  2018-11-01  36.20  69.90
96  2018-12-01  38.80 227.80
97  2019-01-01  39.70 264.30
98  2019-02-01  28.00 121.80
99  2019-03-01  28.00 104.80
100 2019-04-01   8.50  20.40
101 2019-05-01   0.00   0.00
102 2019-06-01  13.70  13.70
103 2019-07-01   0.01   0.01
104 2019-08-01   0.00   0.00
105 2019-09-01   0.00   0.00
106 2019-10-01  24.50  29.60
107 2019-11-01  13.80  33.50
108 2019-12-01  54.00 274.50
109 2020-01-01  99.00 231.30
110 2020-02-01  29.50  76.70
111 2020-03-01  17.20  34.10
112 2020-04-01  10.30  32.51
113 2020-05-01   2.20   3.00
114 2020-06-01   0.00   0.00
115 2020-07-01   0.00   0.00
116 2020-08-01   0.00   0.00
117 2020-09-01   0.00   0.00
118 2020-10-01   0.00   0.00
119 2020-11-01  25.20  86.80
120 2020-12-01  46.60 112.30

What I want to do, is to find the maximum value of column "Pre_mx" for each month, and keep the associated value for Prec_mn. I have used the following code:

df_3 = df_2 %>% group_by( Month =lubridate::month(as.Date(Date), label=TRUE)) %>% 
              summarise(Pd_max=max(Pre_mx),Pmn_max =max(Pre_mn) )

and got the following result:

enter image description here

What I want to get, though, is the information from the remaining columns as well. For example, the maximum value for January is observed at the 1/1/2020 in df_2, which contains the values Pre_mx = 99 and Pre_mn = 231.30. Hence, I would like something like the following:

Date      Pre_mx Pre_mn   Month Pd_max Pmn_max
2020-01-01 99.00 231.30   January 99.00 351.70
......

Solution

  • This will include any rows which have the max Pre_mx or max Pre_mn for the month. Those might occur in different rows, or their might be ties, which would all be included.

    df |>
      mutate(month = lubridate::month(Date, label = TRUE)) |>
      filter(Pd_max == max(Pre_mx) | Pmn_max == max(Pre_mn), .by = month)
    

    If you want a single row per month, you could arrange(-Pre_mx, -Pre_mn) to put the largest Pre_mx first with tiebreaking from Pre_mn (or adjust for whatever tiebreaker logic you want), then use slice() to get the first row per month.