First, is the data and then the manipulations. Finally, is the current method that I am using and as of yet is producing no data. The manipulations are to create a date and then create a rolling 12-Month average.
Monthavg<-
c(20185,20186,20187,20188,20189,201810,201811,201812,20191,20192,20193,20194,20195,20196,
20197,20198,20199,201910,201911,201912,20201
,20202,20203,20204,20205,20206,20207
,20208,20209,202010,202011)
empavg<-c(2,4,6,7,8,10,12,14,16,18,20,22,24,26,28,30,32,36,36,38,40,42,44,46,48,48,50,52,52,54,56)
ces12f <- data.frame(Monthavg,empavg)
Manipulations
ces12f<- ces12f %>% mutate(year = substr(as.character(Monthavg),1,4),
month = substr(as.character(Monthavg),5,7),
date = as.Date(paste(year,month,"1",sep ="-")))
Month_ord <- order(Monthavg)
span_month=12
ces12f<-ces12f %>% mutate(ravg = zoo::rollmeanr(empavg, 12, fill = NA))
Annual difference attempt
ces12f<- ces12f%>%
group_by(Monthavg)%>%
mutate(PreviousYear=lag(ravg,12),
PreviousMonth=lag(ravg),
AnnualDifference=ravg-PreviousYear)%>%
ungroup()
The end goal would be that 202011 minus 201911 or 47.5 minus 25.17 or 22.3. The method that I use above produces nothing but NA's. Any insights as to how I can modify my existing code or simply use an entirely different method would be greatly appreciated.
I tend to be a little more paranoid. That is, if there is even a slight chance that we are missing one month of however many years we have, than doing a lag(..., 12)
is a bad idea, even worse because you will get no warnings or errors, and your data will be wrong.
As such, I'm going to recommend a self-join.
transmute(ces12f, year = as.character(as.integer(year) + 1L), month, lastravg = ravg) %>%
left_join(ces12f, ., by = c("year", "month"))
# Monthavg empavg year month date ravg lastravg
# 1 20185 2 2018 5 2018-05-01 NA NA
# 2 20186 4 2018 6 2018-06-01 NA NA
# 3 20187 6 2018 7 2018-07-01 NA NA
# 4 20188 7 2018 8 2018-08-01 NA NA
# 5 20189 8 2018 9 2018-09-01 NA NA
# 6 201810 10 2018 10 2018-10-01 NA NA
# 7 201811 12 2018 11 2018-11-01 NA NA
# 8 201812 14 2018 12 2018-12-01 NA NA
# 9 20191 16 2019 1 2019-01-01 NA NA
# 10 20192 18 2019 2 2019-02-01 NA NA
# 11 20193 20 2019 3 2019-03-01 NA NA
# 12 20194 22 2019 4 2019-04-01 11.58333 NA
# 13 20195 24 2019 5 2019-05-01 13.41667 NA
# 14 20196 26 2019 6 2019-06-01 15.25000 NA
# 15 20197 28 2019 7 2019-07-01 17.08333 NA
# 16 20198 30 2019 8 2019-08-01 19.00000 NA
# 17 20199 32 2019 9 2019-09-01 21.00000 NA
# 18 201910 36 2019 10 2019-10-01 23.16667 NA
# 19 201911 36 2019 11 2019-11-01 25.16667 NA
# 20 201912 38 2019 12 2019-12-01 27.16667 NA
# 21 20201 40 2020 1 2020-01-01 29.16667 NA
# 22 20202 42 2020 2 2020-02-01 31.16667 NA
# 23 20203 44 2020 3 2020-03-01 33.16667 NA
# 24 20204 46 2020 4 2020-04-01 35.16667 11.58333
# 25 20205 48 2020 5 2020-05-01 37.16667 13.41667
# 26 20206 48 2020 6 2020-06-01 39.00000 15.25000
# 27 20207 50 2020 7 2020-07-01 40.83333 17.08333
# 28 20208 52 2020 8 2020-08-01 42.66667 19.00000
# 29 20209 52 2020 9 2020-09-01 44.33333 21.00000
# 30 202010 54 2020 10 2020-10-01 45.83333 23.16667
# 31 202011 56 2020 11 2020-11-01 47.50000 25.16667
You can verify that each lastempavg
is the previous year's value, and you can mutate
the difference normally, perhaps
transmute(ces12f, year = as.character(as.integer(year) + 1L), month, lastravg = ravg) %>%
left_join(ces12f, ., by = c("year", "month")) %>%
mutate(AnnualDifference = ravg - lastravg)
# Monthavg empavg year month date ravg lastravg AnnualDifference
# 1 20185 2 2018 5 2018-05-01 NA NA NA
# 2 20186 4 2018 6 2018-06-01 NA NA NA
# 3 20187 6 2018 7 2018-07-01 NA NA NA
# 4 20188 7 2018 8 2018-08-01 NA NA NA
# 5 20189 8 2018 9 2018-09-01 NA NA NA
# 6 201810 10 2018 10 2018-10-01 NA NA NA
# 7 201811 12 2018 11 2018-11-01 NA NA NA
# 8 201812 14 2018 12 2018-12-01 NA NA NA
# 9 20191 16 2019 1 2019-01-01 NA NA NA
# 10 20192 18 2019 2 2019-02-01 NA NA NA
# 11 20193 20 2019 3 2019-03-01 NA NA NA
# 12 20194 22 2019 4 2019-04-01 11.58333 NA NA
# 13 20195 24 2019 5 2019-05-01 13.41667 NA NA
# 14 20196 26 2019 6 2019-06-01 15.25000 NA NA
# 15 20197 28 2019 7 2019-07-01 17.08333 NA NA
# 16 20198 30 2019 8 2019-08-01 19.00000 NA NA
# 17 20199 32 2019 9 2019-09-01 21.00000 NA NA
# 18 201910 36 2019 10 2019-10-01 23.16667 NA NA
# 19 201911 36 2019 11 2019-11-01 25.16667 NA NA
# 20 201912 38 2019 12 2019-12-01 27.16667 NA NA
# 21 20201 40 2020 1 2020-01-01 29.16667 NA NA
# 22 20202 42 2020 2 2020-02-01 31.16667 NA NA
# 23 20203 44 2020 3 2020-03-01 33.16667 NA NA
# 24 20204 46 2020 4 2020-04-01 35.16667 11.58333 23.58333
# 25 20205 48 2020 5 2020-05-01 37.16667 13.41667 23.75000
# 26 20206 48 2020 6 2020-06-01 39.00000 15.25000 23.75000
# 27 20207 50 2020 7 2020-07-01 40.83333 17.08333 23.75000
# 28 20208 52 2020 8 2020-08-01 42.66667 19.00000 23.66667
# 29 20209 52 2020 9 2020-09-01 44.33333 21.00000 23.33333
# 30 202010 54 2020 10 2020-10-01 45.83333 23.16667 22.66667
# 31 202011 56 2020 11 2020-11-01 47.50000 25.16667 22.33333
Side note on this: it might be better to keep the year and month stored as integer
, for a few reasons: (1) it makes this kind of thing quite easy; (2) it preserves ordinality, whereas arrange(ces12f, month)
will happily order the months as 1, 10, 11, 12, 2, etc; (3) (subjective) they really are integers, after all.