rdataframedplyrconditional-statementsmethod-missing

Conditionally filling missing data based on other variables in R


enter image description here

sorry for adding the screenshot, I download data from https://www.kaggle.com/datasets/rikdifos/credit-card-approval-prediction

Can someone inform me about the way to fill those NA values that the occupation column has? I create a new variable to determine whether an applicant is working or not and I want to fill NA values as zero if the same observation is zero in is_working column and left the others NA.

df <- data.frame (occupation  = c("NA","NA","Drivers","Accountants","NA","Drivers","Laborers","Cleaning staff","Drivers","Drivers"),
                  is_working = c("1","0","1","1","1","1","1","1","1","1")
                  )

Solution

  • library(dplyr)
    df %>%
      mutate(
        # change string "NA" to missing values NA
        occupation = ifelse(occupation == "NA", NA, occupation),
        # replace NAs where is_working is 0 with 0
        occupation = ifelse(is.na(occupation) & is_working == 0, "0", occupation)
      )
    #        occupation is_working
    # 1            <NA>          1
    # 2               0          0
    # 3         Drivers          1
    # 4     Accountants          1
    # 5            <NA>          1
    # 6         Drivers          1
    # 7        Laborers          1
    # 8  Cleaning staff          1
    # 9         Drivers          1
    # 10        Drivers          1