I am trying to implement a bayesian model in R using bas package with setting up these values for my Model:
databas <- bas.lm(at_areabuilding ~ ., data = dataCOMMA, method = "MCMC", prior = "ZS-null", modelprior = uniform())
I am trying to predict area of a given state with the help of certain area present for that particular state; but for different zip codes. My Model basically finds the various zip codes present in the data for a given state(using a state index for this) and then gives the output.
Now, Whenever I try to predict area of a state, I give this input:
> UT <- data.frame(zip = 84321, loc_st_prov_cd = "UT" ,state_idx = 7)
> predict_1 <- predict(databas,UT, estimator="BMA", interval = "predict", se.fit=TRUE)
> data.frame('state' = 'UT','estimated area' = predict_1$Ybma)
Now, I get the output for this state. Suppose I have a list of states with given zip codes and I want to run my Model (databas) on that list and get the predictions, I cannot do it by using the above approach as it will take time. Is there any other way to do the same? I did the same by the help of one gentleman and here is my code:
pred <- sapply(1:nrow(first), function(row) { predict(basdata,first[row, ],estimator="BMA", interval = "predict", se.fit=TRUE)$Ybma })
basdata: My Model first: my new dataset for which I am predicting area. Now, The issue that i am facing is that the code is taking a long time to predict the values. It iterates over every row and calculates the area. There are 150000 rows in my dataset and I would request if anyone can help me optimizing the performance of this code.
Something like this will iterate over each row of your data frame of states, zips and indices (let's call it states_and_zips
) and return a list of predictions. Each element of this list (which I've called pred
) goes with the corresponding row of state_and_zips
:
pred = lapply(1:nrow(states_and_zips), function(row) {
predict(databas, ~ states_and_zips[row, ],
estimator="BMA", interval = "predict", se.fit=TRUE)$Ybma
})
If Ybma
is a single value, then use sapply
instead of lapply
and it will return a vector of predictions, one for each row of state_and_zips
that you can just add as a new column to states_and_zips
.