I am new to dplyr, and I would be curious about a fast way to get from this data:
ID | Age | YearDied |
---|---|---|
100 | 2 | 2005 |
102 | 4 | NA |
103 | 1 | NA |
106 | 5 | 2002 |
108 | 1 | NA |
109 | 1 | NA |
110 | 4 | NA |
112 | 3 | NA |
To this data (counting every survivor per Age, meaning if an ID is 5 years old, it passed ages 1,2,3,4,5, whereas an ID of 2 just passes 2 ages (1,2) (does this make sense?):
Age | SurvivorNumber |
---|---|
1 | 8 |
2 | 5 |
3 | 4 |
4 | 3 |
5 | 1 |
Is it also possible to combine the previous result with these ones (number of IDs in the category)?:
Age | Current Number of IDs |
---|---|
1 | 3 |
2 | 1 |
3 | 1 |
4 | 2 |
5 | 1 |
This was my starting code for the last case:
groupedDf <- inputDf %>%
count(Age) %>%
group_by(Age = case_when(Age == 1 ~ '1',
TRUE ~ as.character(Age))) %>%
group_by(Age = case_when(Age == 2 ~ '2',
TRUE ~ as.character(Age))) %>%
group_by(Age = case_when(Age == 3 ~ '3',
TRUE ~ as.character(Age))) %>%
group_by(Age = case_when(Age == 4 ~ '4',
TRUE ~ as.character(Age))) %>%
group_by(Age = case_when(Age == 5 ~ '5',
TRUE ~ as.character(Age))) %>%
summarise(n = sum(n)) %>%
arrange(nchar(Age), Age)
It is easier to get the count table first and then generate SurvivorNumber through "reverse cumulative sum":
df %>%
group_by(Age) %>%
reframe(`Current Number of IDs`=n()) %>%
mutate(SurvivorNumber=rev(cumsum(rev(`Current Number of IDs`))))
# A tibble: 5 × 3
Age `Current Number of IDs` SurvivorNumber
<int> <int> <int>
1 1 3 8
2 2 1 5
3 3 1 4
4 4 2 3
5 5 1 1