I have a dataset that looks like this:
library(tidyverse)
name<-c("AAA","AAA","AAA")
value<-c(1:3)
order<-c(2,3,6)
tag<-c(0,0,0)
part_a<-data.frame(name,value,order,tag)
name<-c("AAA","AAA","AAA")
value<-c(1:3)
order<-c(1,5,4)
key<-c(1,1,1)
part_b<-data.frame(name,value,order,key)
df<-bind_rows(part_a,part_b)
df %>% group_by(name) %>% arrange(order)
Groups: name [1]
name value order tag key
<chr> <int> <dbl> <dbl> <dbl>
1 AAA 1 1 NA 1
2 AAA 1 2 0 NA
3 AAA 2 3 0 NA
4 AAA 3 4 NA 1
5 AAA 2 5 NA 1
6 AAA 3 6 0 NA
I am trying to write a code that will allow me to create two new variables named CHECK_1
and CHECK_2
, CHECK_1
needs to have two values it will be 1 if the previous value of the column tag
is NA and the previous value of the column key
is not NA and 0 otherwise, the variable CHECK_2 needs to have only two value it should be "out of range" if the difference between the current and previous value of the column order
is greater than 3 and "within range" otherwise.
This operations must be done within groups in a data frame, I know I can make use of the base R function lag()
but I have tried to incorporate the criteria in a if_else statement with no luck.
My desired output will be:
name | value | order | tag | key | check_1 | check_2 |
---|---|---|---|---|---|---|
AAA | 1 | 1 | NA | 1 | 1 | "within range" |
AAA | 1 | 2 | 0 | NA | 0 | "within range" |
AAA | 2 | 3 | 0 | NA | 0 | "within range" |
AAA | 3 | 4 | NA | 1 | 1 | "within range" |
AAA | 2 | 5 | NA | 1 | 1 | "within range" |
AAA | 3 | 6 | 0 | NA | 0 | "within range" |
You could do something like this with tidyverse
. However, I am unsure what you want to do with the first row of values for the group.
library(tidyverse)
df %>%
group_by(name) %>%
arrange(order) %>%
mutate(check_1 = ifelse(is.na(lag(tag)) & !is.na(lag(key)), 1, 0),
check_2 = ifelse(order - lag(order) > 3, "out of range", "within range"))
Output
name value order tag key check_1 check_2
<chr> <int> <dbl> <dbl> <dbl> <dbl> <chr>
1 AAA 1 1 NA 1 0 NA
2 AAA 1 2 0 NA 1 within range
3 AAA 2 3 0 NA 0 within range
4 AAA 3 4 NA 1 0 within range
5 AAA 2 5 NA 1 1 within range
6 AAA 3 6 0 NA 1 within range
If you are wanting to default the first row values to 1 and "within range", then we could use case_when
to make an extra ifelse condition.
df %>%
group_by(name) %>%
arrange(order) %>%
mutate(check_1 = case_when(row_number() == 1 ~ 1,
is.na(lag(tag)) & !is.na(lag(key)) ~ 1,
TRUE ~ 0),
check_2 = case_when(row_number() == 1 ~ "within range",
order - lag(order) > 3 ~ "out of range",
TRUE ~ "within range"))
name value order tag key check_1 check_2
<chr> <int> <dbl> <dbl> <dbl> <dbl> <chr>
1 AAA 1 1 NA 1 1 within range
2 AAA 1 2 0 NA 1 within range
3 AAA 2 3 0 NA 0 within range
4 AAA 3 4 NA 1 0 within range
5 AAA 2 5 NA 1 1 within range
6 AAA 3 6 0 NA 1 within range