rdatetimedplyrlubridatezoo

Creating a new column with end of each month YMD date given the year&month (dplyr)


I want to create a new column that shows last day of each month in ymd format given the year&month in the same row. Something like this:

month_year    end_month
Jan 2022      31/01/2022
Jan 2022      31/01/2022
Feb 2022     28/02/2022
March 2022    31/03/2022
library(zoo)
library(tidyverse)

# Print data example with specific columns
dput(df[1:20,c(1,19,23)])

output:

structure(list(id = 1:20, posted_date = c("2023-03-12 00:00:00", 
"2023-03-13 00:00:00", "2023-03-20 00:00:00", "2023-03-08 00:00:00", 
"2023-03-16 00:00:00", "2023-03-20 00:00:00", "2023-03-09 00:00:00", 
"2023-03-20 00:00:00", "2023-03-13 00:00:00", "2023-03-21 00:00:00", 
"2023-03-07 00:00:00", "2023-03-08 00:00:00", "2023-03-19 00:00:00", 
"2023-03-19 00:00:00", "2023-03-12 00:00:00", "2023-03-12 00:00:00", 
"2023-03-20 00:00:00", "2023-03-20 00:00:00", "2023-03-09 00:00:00", 
"2023-03-12 00:00:00"), month_year = structure(c(2023.16666666667, 
2023.16666666667, 2023.16666666667, 2023.16666666667, 2023.16666666667, 
2023.16666666667, 2023.16666666667, 2023.16666666667, 2023.16666666667, 
2023.16666666667, 2023.16666666667, 2023.16666666667, 2023.16666666667, 
2023.16666666667, 2023.16666666667, 2023.16666666667, 2023.16666666667, 
2023.16666666667, 2023.16666666667, 2023.16666666667), class = "yearmon")), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))

I made a few attempts using the solution here as follows:

job_posts %>% 
   group_by(month_year = as.yearmon(Date)) %>%
   summarise(First = floor_date(first(Date), 'month'), 
             Last = ceiling_date(last(Date), 'month')-1)

But I keep receiving this error message, but I am confused because the "month_year" in my df is indeed in a yearmon format, not numeric, so I thought the code above should work? "Error in group_by(., month_year = as.yearmon(Date)) : Caused by error in as.numeric(): ! cannot coerce type 'closure' to vector of type 'double'"


Solution

  • There are three different possibilities for what you're wanting to do:

    1. Get the first and last posted_date in each month

    df |> mutate(first = first(posted_date), last = last(posted_date), .by = month_year)
    

    But the rows are not in order, and the first and last row in the data have the same date, so we end up getting the same date for the first and last columns, so I assume this isn't what you want.

    2. Get the earliest and latest dates in each month

    df |> mutate(first = min(posted_date), last = max(posted_date), .by = month_year)
    

    But obviously, if the data misses days then we won't get the absolute last day in each month.

    3. Get the first and last days of the month of each month_year

    df |> summarise(first = as.Date(month_year[1]), last = as.Date(month_year[1] + (1/12))-1, .by = month_year)
    

    Output:

    # A tibble: 1 × 3
      month_year first      last      
      <yearmon>  <date>     <date>    
    1 Mar 2023   2023-03-01 2023-03-31
    

    This final one, I believe, is what you are looking for.

    Error: cannot coerce type 'closure' to vector of type 'double

    The issue appears very likely to be, as Allan eloquently stated, that you are 'trying to perform arithmetic on the function `Date` from base R