rdata.tablenumericna

Replace NA with 0, only in numeric columns in data.table


I have a data.table with columns of different data types. My goal is to select only numeric columns and replace NA values within these columns by 0. I am aware that replacing na-values with zero goes like this:

DT[is.na(DT)] <- 0

To select only numeric columns, I found this solution, which works fine:

DT[, as.numeric(which(sapply(DT,is.numeric))), with = FALSE]

I can achieve what I want by assigning

DT2 <- DT[, as.numeric(which(sapply(DT,is.numeric))), with = FALSE]

and then do:

DT2[is.na(DT2)] <- 0

But of course I would like to have my original DT modified by reference. With the following, however:

DT[, as.numeric(which(sapply(DT,is.numeric))), with = FALSE]
                 [is.na(DT[, as.numeric(which(sapply(DT,is.numeric))), with = FALSE])]<- 0

I get

"Error in [.data.table([...] i is invalid type (matrix)"

What am I missing? Any help is much appreciated!!


Solution

  • We can use set

    for(j in seq_along(DT)){
        set(DT, i = which(is.na(DT[[j]]) & is.numeric(DT[[j]])), j = j, value = 0)
     }
    

    Or create a index for numeric columns, loop through it and set the NA values to 0

    ind <-   which(sapply(DT, is.numeric))
    for(j in ind){
        set(DT, i = which(is.na(DT[[j]])), j = j, value = 0)
    }
    

    data

    set.seed(24)
    DT <- data.table(v1= c(NA, 1:4), v2 = c(NA, LETTERS[1:4]), v3=c(rnorm(4), NA))