rnar-spspatial-data-frame

r - Replace NAs with values according to two index vectors


I have a spatial points data frame with characteristics of houses sold spanning through several years. I appended to it neighborhood attributes using "over" in {sp}. For each year of my housing data there is a neighborhood´s data set joined.

The problem: neighborhood data for different years don't always contain the same variables. Therefore, when joined to the housing data, I obtain NAs in these non-shared variables for houses sold in some particular years.

Ideal solution: for each row in my data, replace NAs with same column data (V1) from the same neighborhood (nb) but closest year available (y).

      [,y]  [,nb] [,V1]
 [1,] 1993 30000 2752
 [2,] 1993 30000 2752
 [3,] 1994 30000 NA
 [4,] 1994 50000 2554
 [5,] 1995 30000 NA
 [6,] 1996 30000 2650
 [7,] 1996 50000 NA

Ideally, replace NAs such that [3,V1] = 2752; [5,V1] = 2650, and [7,V1] = 2554. The data frame contains over 250k obs so looping through the whole thing is rather cumbersome.


Solution

  • You can use the function below for your purpose.

    get_rid_of_NAs <- function(urmatrix) {
      myvector <- vector()
      counter <- 0
      myvector_1 <- vector()
    
      for(i in 1:nrow(urmatrix)){
       out <- urmatrix[i,2]
       out_1 <- urmatrix[i,1]
       myvector_1 <- c(myvector_1,out_1)
       myvector <- c(myvector,out)
    
       if(urmatrix[i,3]!=NA){
       next
       }      
       orders <- order(myvector[myvector==out],decreasing=TRUE)
       index <- which.min(myvector_1[orders])    
       urmatrix[i,3] <- urmatrix[index,3]
       }
     return(urmatrix)
    }
    

    Now use the function to compute.

               get_rid_of_NAs(ENTERYOURMATRIXHERE.)
    

    R can easily handle such a loop, but i would suggest the for loop in this case.

    Seriously there are many people here saying "there aer 10min data r cant handle etc etc." R is not excel, R is created to handle the data