I have an excel file which I have imported into R using read_excel
. The excel file contains about 8000+ records with 58 columns. There are many cells with NA values. Across the different variables, they may be entered as "n/a", "N/A", "n/A", "N/a","na","NA", "n a","(blank)".
When I'm importing the file, I use this code:
read_excel("path", col_names = TRUE, na = c("n/a", "N/A", "n/A", "N/a","na","NA", "n a",""))
In one column, I have records with this value NR
, which stands for No Record and is not the same thing as NA
.
When I run the read_excel
code above, the NR
becomes NA
. If I take out the na = c("n/a", "N/A", "n/A", "N/a","na","NA", "n a","")
section, the NR
is imported as it should. How do I get R to recognize that NR
is not the same thing? As a side note, I cannot change the NR
notation to something else.
You can try something like this to fix the problem :
### Packages
library(dplyr)
library(stringr)
### Load your data with all columns typed to character
### which should preserved the NR value. Blank cells will be automatically converted to NA
NA_ <- read_excel("C:/Users/your_name/Downloads/NA.xlsx",
col_names = TRUE,
col_types ='text')
NA_ dataframe :
# A tibble: 4 × 5
Prenom Nom Values Ville Record
<chr> <chr> <chr> <chr> <chr>
1 Jean N/a 1 Paris 1
2 n/a Dupond 2 NA 3
3 N/A na NA Montpellier 4
4 Marc Spencer n a NA NR
Time to transform all forms of n/a,N/A,... to real NA :
### Replace NA values with real NA using a regex
### Change afterwards the type of one or more columns if needed
NA_ %>% mutate(across(everything(),~str_replace(.x,"(?i)\\b(n\\s*\\/?\\s*a)\\b",NA_character_)),
Values=as.numeric(Values))
Output :
# A tibble: 4 × 5
Prenom Nom Values Ville Record
<chr> <chr> <dbl> <chr> <chr>
1 Jean NA 1 Paris 1
2 NA Dupond 2 NA 3
3 NA NA NA Montpellier 4
4 Marc Spencer NA NA NR