rnaread.csv

Why doesn't read.csv import blank cells as NA?


Why doesn't read.csv() interpret the blank cells in the linked .csv file as NA? It seems to ignore them whether the columns in the data frame are "character" or "factor" type.

Data: https://drive.google.com/file/d/1muEulkPNw2XrGERAO6axPjgIyJ5UMsWt/view?usp=sharing

For example, no NA values appear in either of the following:

dat <- read.csv("test.csv")

# character type
is.na(dat$var)
is.na(dat$var1)

# factor type
dat$var_f <- factor(dat$var)
dat$var_f1 <- factor(dat$var1)

is.na(dat$var_f)
is.na(dat$var_f1)

Solution

  • As @lroha says in comments, an empty string is usually a legal value in character input/it can't be assumed to be a missing value.

    From ?read.csv:

    na.strings: a character vector of strings which are to be interpreted as ‘NA’ values. Blank fields are also considered to be missing values in logical, integer, numeric and complex fields.

    [emphasis added; note that list of types doesn't include character or factor ...]

    By specifying na.strings = "" (or more safely na.strings = c("", "NA") to keep "NA" as an N/A value as well), you're letting R know that you consider an empty string to be a missing value.

    Note that if you want to extend "blank" to include cells with some amount of whitespace (spaces, tabs, etc.) as well as "empty" (zero-length strings) cells, you have to be more creative (unfortunately na.strings doesn't allow you to set a regular expression; see e.g. How to remove blank cells from dataframe?)