I have a data frame of household members containing 3 integer columns, 'hid', 'sub', and 'age'. I'd like to create a new logical variable in the data frame called 'hh' representing the household head, defined as follows:
There must be 1 and only 1 household head per household.
My data looks something like this:
# A tibble: 10 x 3
hid sub age
<dbl> <dbl> <dbl>
1 1 1 75
2 1 2 55
3 2 1 35
4 3 1 69
5 3 2 72
6 4 1 69
7 5 1 15
8 5 2 17
9 5 3 42
10 6 1 72
And I'd like the result to be like this:
> result
# A tibble: 10 x 4
hid sub age hh
<dbl> <dbl> <dbl> <lgl>
1 1 1 75 FALSE # Not 18-65 & there is another aged 18-65 within this household.
2 1 2 55 TRUE # Aged 18-65 and the smallest sub id within this household.
3 2 1 35 TRUE # Only 1 in this household.
4 3 1 69 TRUE # Not aged 18-65, but no other member is and smallest sub id.
5 3 2 72 FALSE # Not aged 18-65, and not the smallest sub id.
6 4 1 69 TRUE # Only 1 in this household.
7 5 1 15 FALSE # Not aged 18-65 and others in this household qualify.
8 5 2 17 FALSE # Not aged 18-65 and others in this household qualify.
9 5 3 42 TRUE # Aged 18-65 and the smallest sub id among those aged 18-65 within this household.
10 5 4 62 FALSE # Aged 18-65 but not the smallest sub id among those aged 18-65 within this household.
Thank you!
d <- structure(list(hid = c(1, 1, 2, 3, 3, 4, 5, 5, 5, 5),
sub = c(1, 2, 1, 1, 2, 1, 1, 2, 3, 4),
age = c(75, 55, 35, 69, 72, 69, 15, 17, 42, 62)),
row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
You can arrange
the data in such a way that the first row of each group is the hh
value you are looking for.
library(dplyr)
d %>%
arrange(hid, !between(age, 18, 65), sub) %>%
mutate(hh = !duplicated(hid))
# hid sub age hh
# <dbl> <dbl> <dbl> <lgl>
# 1 1 2 55 TRUE
# 2 1 1 75 FALSE
# 3 2 1 35 TRUE
# 4 3 1 69 TRUE
# 5 3 2 72 FALSE
# 6 4 1 69 TRUE
# 7 5 3 42 TRUE
# 8 5 4 62 FALSE
# 9 5 1 15 FALSE
#10 5 2 17 FALSE
!between(age, 18, 65)
would arrange the data keeping the individuals aged 18-65 first before others who are outside the range.