I have a variation on this post where I need to build a comma-separated field of values based on codes across 5 columns. I need to examine each column to determine if any one of several different crops are present in that year. Each crop can be represented by multiple codes, though. I can do a series of grepl statements with each combination of codes, but that will get very long. Can the grepl pattern be built from variables? Or is there another way to do this?
Here is an example limiting the data to 4 years, 3 crops (corn, soybeans, and wheat), and just a couple different codes each.
corn_codes <- 1
soybean_codes <- c(2,446)
wheat_codes <- c(3,427)
df <-
crossing(yr2019=as.character(c(1,2,3,446,427)),
yr2020=as.character(c(1,2,3,446,427)),
yr2021=as.character(c(1,2,3,446,427)),
yr2022=as.character(c(1,2,3,446,427))) %>%
rowwise() %>%
mutate(new_column = case_when(
grepl("1,.*2,.*3|1,.*3,.*2|2,.*1,.*3|2,.*3,.*1|3,.*1,.*2|3,.*2,.*1",
paste(as.vector(distinct(as.data.frame(c(yr2019,yr2020,yr2021,yr2022))))[[1]],
collapse = ",")) ~ "1,2,3",
grepl("1,.*446,.*3|1,.*3,.*446|446,.*1,.*3|446,.*3,.*1|3,.*1,.*446|3,.*446,.*1",
paste(as.vector(distinct(as.data.frame(c(yr2019,yr2020,yr2021,yr2022))))[[1]],
collapse = ",")) ~ "1,2,3",
grepl("1,.*2,.*427|1,.*427,.*2|2,.*1,.*427|2,.*427,.*1|427,.*1,.*2|427,.*2,.*1",
paste(as.vector(distinct(as.data.frame(c(yr2019,yr2020,yr2021,yr2022))))[[1]],
collapse = ",")) ~ "1,2,3",
grepl("1,.*446,.*427|1,.*427,.*446|446,.*1,.*427|446,.*427,.*1|427,.*1,.*446|427,.*446,.*1",
paste(as.vector(distinct(as.data.frame(c(yr2019,yr2020,yr2021,yr2022))))[[1]],
collapse = ",")) ~ "1,2,3",
TRUE ~ "x"
) # end case_when
) %>% # end mutate
print(n=20)
First 20/625 rows:
# A tibble: 625 × 5
# Rowwise:
yr2019 yr2020 yr2021 yr2022 new_column
<chr> <chr> <chr> <chr> <chr>
1 1 1 1 1 x
2 1 1 1 2 x
3 1 1 1 3 x
4 1 1 1 427 x
5 1 1 1 446 x
6 1 1 2 1 x
7 1 1 2 2 x
8 1 1 2 3 1,2,3
9 1 1 2 427 1,2,3
10 1 1 2 446 x
11 1 1 3 1 x
12 1 1 3 2 1,2,3
13 1 1 3 3 x
14 1 1 3 427 1,2,3
15 1 1 3 446 1,2,3
16 1 1 427 1 x
17 1 1 427 2 1,2,3
18 1 1 427 3 x
19 1 1 427 427 x
20 1 1 427 446 1,2,3
# … with 605 more rows
This looks for any of the three crops occurring in the four years, in any order. There are two issues:
All suggestions are appreciated.
This just seems the wrong way to approach the analysis. It seems to me that it would be much easier to switch the codes for the crop names, then use a simple ifelse
to determine whether all crops are present:
df %>%
mutate(across(everything(), ~ case_when(.x %in% corn_codes ~ "corn",
.x %in% soybean_codes ~ "soybean",
.x %in% wheat_codes ~ "wheat",
TRUE ~ "other"))) %>%
rowwise() %>%
mutate(new_column = ifelse(all(c("corn", "soybean", "wheat") %in%
c_across(everything())), "1, 2, 3", "x"))
#> # A tibble: 625 x 5
#> # Rowwise:
#> yr2019 yr2020 yr2021 yr2022 new_column
#> <chr> <chr> <chr> <chr> <chr>
#> 1 corn corn corn corn x
#> 2 corn corn corn soybean x
#> 3 corn corn corn wheat x
#> 4 corn corn corn wheat x
#> 5 corn corn corn soybean x
#> 6 corn corn soybean corn x
#> 7 corn corn soybean soybean x
#> 8 corn corn soybean wheat 1, 2, 3
#> 9 corn corn soybean wheat 1, 2, 3
#> 10 corn corn soybean soybean x
#> 11 corn corn wheat corn x
#> 12 corn corn wheat soybean 1, 2, 3
#> 13 corn corn wheat wheat x
#> 14 corn corn wheat wheat x
#> 15 corn corn wheat soybean 1, 2, 3
#> 16 corn corn wheat corn x
#> 17 corn corn wheat soybean 1, 2, 3
#> 18 corn corn wheat wheat x
#> 19 corn corn wheat wheat x
#> 20 corn corn wheat soybean 1, 2, 3
#> # i 605 more rows
Created on 2023-08-23 with reprex v2.0.2