I have a spatial points data frame with characteristics of houses sold spanning through several years. I appended to it neighborhood attributes using "over" in {sp}. For each year of my housing data there is a neighborhood´s data set joined.
The problem: neighborhood data for different years don't always contain the same variables. Therefore, when joined to the housing data, I obtain NAs in these non-shared variables for houses sold in some particular years.
Ideal solution: for each row in my data, replace NAs with same column data (V1) from the same neighborhood (nb) but closest year available (y).
[,y] [,nb] [,V1]
[1,] 1993 30000 2752
[2,] 1993 30000 2752
[3,] 1994 30000 NA
[4,] 1994 50000 2554
[5,] 1995 30000 NA
[6,] 1996 30000 2650
[7,] 1996 50000 NA
Ideally, replace NAs such that [3,V1] = 2752
; [5,V1] = 2650
, and [7,V1] = 2554
. The data frame contains over 250k obs so looping through the whole thing is rather cumbersome.
You can use the function below for your purpose.
get_rid_of_NAs <- function(urmatrix) {
myvector <- vector()
counter <- 0
myvector_1 <- vector()
for(i in 1:nrow(urmatrix)){
out <- urmatrix[i,2]
out_1 <- urmatrix[i,1]
myvector_1 <- c(myvector_1,out_1)
myvector <- c(myvector,out)
if(urmatrix[i,3]!=NA){
next
}
orders <- order(myvector[myvector==out],decreasing=TRUE)
index <- which.min(myvector_1[orders])
urmatrix[i,3] <- urmatrix[index,3]
}
return(urmatrix)
}
Now use the function to compute.
get_rid_of_NAs(ENTERYOURMATRIXHERE.)
R can easily handle such a loop, but i would suggest the for loop in this case.
Seriously there are many people here saying "there aer 10min data r cant handle etc etc." R is not excel, R is created to handle the data