I want to create a new column that shows last day of each month in ymd format given the year&month in the same row. Something like this:
month_year end_month
Jan 2022 31/01/2022
Jan 2022 31/01/2022
Feb 2022 28/02/2022
March 2022 31/03/2022
library(zoo)
library(tidyverse)
# Print data example with specific columns
dput(df[1:20,c(1,19,23)])
output:
structure(list(id = 1:20, posted_date = c("2023-03-12 00:00:00",
"2023-03-13 00:00:00", "2023-03-20 00:00:00", "2023-03-08 00:00:00",
"2023-03-16 00:00:00", "2023-03-20 00:00:00", "2023-03-09 00:00:00",
"2023-03-20 00:00:00", "2023-03-13 00:00:00", "2023-03-21 00:00:00",
"2023-03-07 00:00:00", "2023-03-08 00:00:00", "2023-03-19 00:00:00",
"2023-03-19 00:00:00", "2023-03-12 00:00:00", "2023-03-12 00:00:00",
"2023-03-20 00:00:00", "2023-03-20 00:00:00", "2023-03-09 00:00:00",
"2023-03-12 00:00:00"), month_year = structure(c(2023.16666666667,
2023.16666666667, 2023.16666666667, 2023.16666666667, 2023.16666666667,
2023.16666666667, 2023.16666666667, 2023.16666666667, 2023.16666666667,
2023.16666666667, 2023.16666666667, 2023.16666666667, 2023.16666666667,
2023.16666666667, 2023.16666666667, 2023.16666666667, 2023.16666666667,
2023.16666666667, 2023.16666666667, 2023.16666666667), class = "yearmon")), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
I made a few attempts using the solution here as follows:
job_posts %>%
group_by(month_year = as.yearmon(Date)) %>%
summarise(First = floor_date(first(Date), 'month'),
Last = ceiling_date(last(Date), 'month')-1)
But I keep receiving this error message, but I am confused because the "month_year" in my df is indeed in a yearmon format, not numeric, so I thought the code above should work?
"Error in group_by(., month_year = as.yearmon(Date)) :
Caused by error in as.numeric()
:
! cannot coerce type 'closure' to vector of type 'double'"
There are three different possibilities for what you're wanting to do:
df |> mutate(first = first(posted_date), last = last(posted_date), .by = month_year)
But the rows are not in order, and the first and last row in the data have the same date, so we end up getting the same date for the first and last columns, so I assume this isn't what you want.
df |> mutate(first = min(posted_date), last = max(posted_date), .by = month_year)
But obviously, if the data misses days then we won't get the absolute last day in each month.
df |> summarise(first = as.Date(month_year[1]), last = as.Date(month_year[1] + (1/12))-1, .by = month_year)
Output:
# A tibble: 1 × 3
month_year first last
<yearmon> <date> <date>
1 Mar 2023 2023-03-01 2023-03-31
This final one, I believe, is what you are looking for.
The issue appears very likely to be, as Allan eloquently stated, that you are 'trying to perform arithmetic on the function `Date` from base R