rdataframe

update a column of a dataframe using mutate and case_when


I have this sample data frame

df<-data.frame(old_farm=c("Yes", "Yes","Yes", "No", "No", "No" , NA ),
env_year=c(2011, 2020,2019,2010,2010,2010, NA),
global_M=c("Yes", "Yes", "No", "Yes","Yes", "No", NA ),
audit_year=c(2014, NA,NA,2010,NA,NA,NA),
Situation=c("New costat 2011, Closed antex 2014",
   "New costat 2020",
 "New costat 2019",
"Closed antex 2010",
"missing entry",
"Full exit", "Not at all"))

df
  old_farm env_year global_M audit_year                          Situation
1      Yes     2011      Yes       2014 New costat 2011, Closed antex 2014
2      Yes     2020      Yes         NA                    New costat 2020
3      Yes     2019       No         NA                    New costat 2019
4       No     2010      Yes       2010                  Closed antex 2010
5       No     2010      Yes         NA                      missing entry
6       No     2010       No         NA                          Full exit
7     <NA>       NA     <NA>         NA                         Not at all

I want to update the last column "Situation". In particular,

  1. if old_farm == "Yes" & global_M == "Yes" & !is.na(audit_year), then I want to have in the Situation column: "New costat xa", env_year, "Closed antex xa", audit_year; where the env_year and audit_year are the years from the corresponding columns.
  2. and if old_farm == "No" & global_M == "Yes" & !is.na(audit_year), then I want to have in the last column "Closed antex xa", audit_year, where the audit_year is the year from the corresponding column.

So I want to have

df_n<-data.frame(old_farm=c("Yes", "Yes","Yes", "No", "No", "No" , NA ),
env_year=c(2011, 2020,2019,2010,2010,2010, NA),
global_M=c("Yes", "Yes", "No", "Yes","Yes", "No", NA ),
audit_year=c(2014, NA,NA,2010,NA,NA,NA),
Situation=c("New costat xa2011, Closed antex xa2014",
   "New costat 2020",
 "New costat 2019",
"Closed antex xa2010",
"missing entry",
"Full exit", "Not at all"))


df_n
 old_farm env_year global_M audit_year                              Situation
1      Yes     2011      Yes       2014 New costat xa2011, Closed antex xa2014
2      Yes     2020      Yes         NA                        New costat 2020
3      Yes     2019       No         NA                        New costat 2019
4       No     2010      Yes       2010                    Closed antex xa2010
5       No     2010      Yes         NA                          missing entry
6       No     2010       No         NA                              Full exit
7     <NA>       NA     <NA>         NA                             Not at all

So I executed these three alternative commands but none of them works

    library(dplyr)
    
   
 df %>%
  mutate(
Situation = case_when(
    old_farm == "Yes" & global_M == "Yes" & !is.na(audit_year) ~
      paste0("New costat xa", env_year, ", Closed antex xa", audit_year),
old_farm == "No" & global_M == "Yes" & !is.na(audit_year) ~
      paste0("Closed antex xa",  audit_year)      ))
 


 df %>%
  mutate(across("Situation") , case_when(
    old_farm == "Yes" & global_M == "Yes" & !is.na(audit_year) ~
      paste0("New costat xa", env_year, ", Closed antex xa", audit_year) ,
old_farm == "No" & global_M == "Yes" & !is.na(audit_year) ~
      paste0("Closed antex xa",  audit_year)   ))


df %>%
  mutate_at("Situation",~ case_when(
    old_farm == "Yes" & global_M == "Yes" & !is.na(audit_year) ~
      paste0("New costat xa", env_year, ", Closed antex xa", audit_year),
old_farm == "No" & global_M == "Yes" & !is.na(audit_year) ~
      paste0("Closed antex xa",  audit_year)  ))

I am not sure where the mistake is. My real data set contains thousands of rows.


Solution

  • Without room to improve there's nothing wrong with your approach. Maybe you just missed to use .default

    library(dplyr)
    
    df %>% 
      mutate(Situation = 
        case_when(
          old_farm == "Yes" & global_M == "Yes" & !is.na(audit_year) ~ 
            paste0("New costat xa", env_year, ", Closed antex xa", audit_year), 
          old_farm == "No" & global_M == "Yes" & !is.na(audit_year) ~ 
            paste0("Closed antex xa", audit_year), .default = Situation))
      old_farm env_year global_M audit_year                              Situation
    1      Yes     2011      Yes       2014 New costat xa2011, Closed antex xa2014
    2      Yes     2020      Yes         NA                        New costat 2020
    3      Yes     2019       No         NA                        New costat 2019
    4       No     2010      Yes       2010                    Closed antex xa2010
    5       No     2010      Yes         NA                          missing entry
    6       No     2010       No         NA                              Full exit
    7     <NA>       NA     <NA>         NA                             Not at all