rreplacecellsurround

Fill cell based on surrounding cells R


My initial matrix looks like the following (but my matrix is huge)

A NA A A A D D B NA B C NA C
A NA A B B D C A NA A A NA A
D NA D D A A A C NA C C NA C
structure(c("A", "A", "D", NA, NA, NA, "A", "A", "D", "A", "B", 
"D", "A", "B", "A", "D", "D", "A", "D", "C", "A", "B", "A", "C", 
NA, NA, NA, "B", "A", "C", "C", "A", "C", NA, NA, NA, "C", "A", 
"C"), .Dim = c(3L, 13L), .Dimnames = list(NULL, c("V1", "V2", 
"V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", "V12", 
"V13")))

I want to substitute the NA with the letters surroundings (left and right), if they are the same, that is, I want something like this:

A A A A A D D B B B C C C
A A A B B D C A A A A A A
D D D D A A A C C C C C C
structure(c("A", "A", "D", "A", "A", "D", "A", "A", "D", "A", 
"B", "D", "A", "B", "A", "D", "D", "A", "D", "C", "A", "B", "A", 
"C", "B", "A", "C", "B", "A", "C", "C", "A", "C", "C", "A", "C", 
"C", "A", "C"), .Dim = c(3L, 13L), .Dimnames = list(NULL, c("V1", 
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", 
"V12", "V13")))

So, if both surrounding letters are the same, I would change the NA to the surrounding letter, otherwise, I would keep the NA.

Any ideas?

Thank you very much.


Solution

  • Here my approach without using additional librariey:

    dat <- matrix(c('A',NA,'A','A',NA,'B',
                  'B',NA,'A','B',NA,'B',
                  'B',NA,NA,'B','B',NA
                  ),nrow=3,byrow=TRUE)
    
    t(apply(dat,1,function(x){
        pos <- which(!is.na(x))
        ## if the delta of the index of two non-na elements is 2 -> potential match
        dif <- which(diff(pos)==2)
        ## prevent to process rows with no potential match (woiuld convert NA to "NA"
        if(length(dif)){ 
            x[pos[dif]+1] <- sapply(dif,function(y) ifelse(x[pos[y]]==x[pos[y]+2], x[pos[y]],NA))
        }
        x
    }))
    

    Questions are: how do you handle a sequence of NA's and NA's at the margins

    Here the version which allows NA sequences to be handeld too

    t(apply(dat,1,function(x){
        pos <- which(!is.na(x))
        ## if the delta of the index of two non-na elements is > 1 -> potential match
        dif <- diff(pos)
        for(cur in which(dif>1)){
            if(x[pos[cur]]==x[pos[cur]+dif[cur]]){
                x[(pos[cur]+1):(pos[cur]+dif[cur])] <- x[pos[cur]]
            }
        }
        x
    }))