I have a dataset with some varaibles which indicate if an older can or cannot do an activity (take the bus, bathing...). I have to create some variables like "In group C, the older needs assistance to perform 2 activities including bathing." #In group D, the older needs assistance to perform 3 activities including bathing and dressing.
So observations cannot be in two groups. My dataset is like:
bathing take_bus dressing eating
1 4 4 4 3
2 2 1 3 2
3 4 2 4 2
4 5 4 1 2
5 2 4 4 1
The numbers indicate a level of difficulty to do the activity. I am only interested in level 4 or higher (the older cannot do an activity at all alone).
So for example, here, individuals 3 and 4 are in the C group. Individual 1 is in the D group BUT should not be in the C group. Individual 5 is not in group C because he can bath alone.
I did something like this:
df$is_C <- ifelse(df$bathing >= 4 & (df$dressing >= 4 | df$eating >= 4 |
df$take_bus >= 4), 1, 0)
df$is_C <- factor(x = df$is_C, levels = c(1, 0), labels = "Group_C", "Not_Group_C")
df$is_D <- ifelse(df$bathing >= 4 & df$dressing >= 4 & ( df$eating >= 4 | df$take_bus >= 4), 1, 0)
df$is_D <- factor(x = df$is_D, levels = c(1, 0), labels = "Group_D", "Not_Group_D")
However when I do that:
>table(df$is_C, df$is_D)
Group_D Not_Group_D
Group_C 683 290
Not_Group_C 0 9650
So 683 people are in the group C and should only be in the group D.... (It is ok to have people not in group C and not in group D because I have other variables).
What should I do???????
Thank you all for your kindness and your answers!
Here is a solution.
In order to make it more readable, two functions are defined, both returning logical values. Then the logical values are used for mutual exclusion of groups C and D. When this is done, the values are coerced to integer and then to factor.
f_is_C <- function(x, level = 4) x[1] >= level && any(x[-1] >= level)
f_is_D <- function(x, level = 4) all(x[1:2] >= level) && any(x[3:4] >= level)
is_D <- apply(df, 1, f_is_D)
is_C <- apply(df, 1, f_is_C) & !is_D # mutual exclusion
df$is_C <- factor(as.integer(is_C), levels = 1:0, labels = c("Group_C", "Not_Group_C"))
df$is_D <- factor(as.integer(is_D), levels = 1:0, labels = c("Group_D", "Not_Group_D"))
with(df, table(is_C, is_D))
# is_D
#is_C Group_D Not_Group_D
# Group_C 0 2
# Not_Group_C 1 2