My dataframe2
looks like this
dataframe2 = structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
Quarterly = c(2006.1, 2006.1, 2006.1, 2006.2, 2006.3, 2006.4, 2006.1, 2006.2, 2006.3,
2006.3, 2006.4, 2006.1, 2006.1, 2006.1, 2006.2, 2006.2, 2006.3, 2006.4),
Status = c("Employed", "Employed", "Employed", "Employed", "Null", "Employed", "Employed",
"Employed", "Employed", "Employed", "Employed", "Null", "Null", "Employed",
"Employed", "Employed", "Employed", "Employed")),
class = "data.frame", row.names = c(NA, -18L))
ID Quarterly Status
1 2006.1 Employed
1 2006.1 Employed
1 2006.1 Employed
1 2006.2 Employed
1 2006.3 Null
1 2006.4 Employed
2 2006.1 Employed
2 2006.2 Employed
2 2006.3 Employed
2 2006.3 Employed
2 2006.4 Employed
3 2006.1 Null
3 2006.1 Null
3 2006.1 Employed
3 2006.2 Employed
3 2006.2 Employed
3 2006.3 Employed
3 2006.4 Employed
I'm keen for it to look like this, so that there is only one observation per ID
ID Quarterly Status
1 2006.1. Employed
1 2006.2 Employed
1 2006.3 Null
1 2006.4 Employed
2 2006.1 Employed
2 2006.2 Employed
2 2006.3 Employed
2 2006.4 Employed
3 2006.1 Null
3 2006.2 Employed
3 2006.3 Employed
3 2006.4 Employed
I've tried a couple of options that I had found on this website, and none have worked as hoped. I did:
(a)
group_by(ID,Quarterly) %>% filer(n()>1)
and (b)
group_by(ID,Quarterly) %>% distinct(ID, keep.all = TRUE)
.
Option (b) just removed all the Quarterly
values except 2023.1
(the latest my dates go to).
The way to do it with dplyr
is using summarise
result <- dataframe2 |> summarise(.by = c("ID", "Quarterly"), first(Status))
For simplicity, I took the function first to summarize the group over ID and Quaterly