rimport-from-excel

Non-NA values being replaced as NA during Excel File Import into R


I have an excel file which I have imported into R using read_excel. The excel file contains about 8000+ records with 58 columns. There are many cells with NA values. Across the different variables, they may be entered as "n/a", "N/A", "n/A", "N/a","na","NA", "n a","(blank)".

When I'm importing the file, I use this code:

read_excel("path", col_names = TRUE, na = c("n/a", "N/A", "n/A", "N/a","na","NA", "n a",""))

In one column, I have records with this value NR, which stands for No Record and is not the same thing as NA.

When I run the read_excel code above, the NR becomes NA. If I take out the na = c("n/a", "N/A", "n/A", "N/a","na","NA", "n a","") section, the NR is imported as it should. How do I get R to recognize that NR is not the same thing? As a side note, I cannot change the NR notation to something else.


Solution

  • You can try something like this to fix the problem :

    ### Packages
    library(dplyr)
    library(stringr)
    
    ### Load your data with all columns typed to character
    ### which should preserved the NR value. Blank cells will be automatically converted to NA
    NA_ <- read_excel("C:/Users/your_name/Downloads/NA.xlsx",
                      col_names = TRUE,
                      col_types ='text')
    

    NA_ dataframe :

    # A tibble: 4 × 5
      Prenom Nom     Values Ville       Record
      <chr>  <chr>   <chr>  <chr>       <chr> 
    1 Jean   N/a     1      Paris       1     
    2 n/a    Dupond  2      NA          3     
    3 N/A    na      NA     Montpellier 4     
    4 Marc   Spencer n a    NA          NR  
    

    Time to transform all forms of n/a,N/A,... to real NA :

    ### Replace NA values with real NA using a regex
    ### Change afterwards the type of one or more columns if needed
    NA_ %>% mutate(across(everything(),~str_replace(.x,"(?i)\\b(n\\s*\\/?\\s*a)\\b",NA_character_)),
                   Values=as.numeric(Values))
    

    Output :

    # A tibble: 4 × 5
      Prenom Nom     Values Ville       Record
      <chr>  <chr>    <dbl> <chr>       <chr> 
    1 Jean   NA           1 Paris       1     
    2 NA     Dupond       2 NA          3     
    3 NA     NA          NA Montpellier 4     
    4 Marc   Spencer     NA NA          NR