I have the following code that selects 4 rows of iris 1000x, and takes the mean of each 4 row sample:
library(dplyr)
iris<- iris
storage<- list()
counter<- 0
for (i in 1:1000) {
# sample 3 randomly selected transects 100 time
tempsample<- iris[sample(1:nrow(iris), 4, replace=F),]
storage[[i]]=tempsample
counter<- counter+1
print(counter)
}
# Unpack results into dataframe
results<- do.call(rbind, storage)
View(results)
results_2<- as.data.frame(results)
results_2<- results_2 %>% mutate(Aggregate = rep(seq(1,ceiling(nrow(results_2)/4)),each = 4))
# View(results_2)
final_results<- aggregate(results_2[,1:4], list(results_2$Aggregate), mean)
# View(final_results)
I want to calculate the bias of each column in relation to their true population parameter. For example using SimDesign
's bias()
:
library(SimDesign)
(bias(final_results[,2:5], parameter=c(5,3,2,1), type='relative'))*100
In this code, the values of parameter are hypothetical true pop. values of each column in the dataframe. I want to do this process 100x to get a distribution of bias estimates for each variable in the dataframe. However, I'm not sure how to fit all of this into a for loop (what I think would be the way to go) so the final output is a dataframe with 100 rows of bias measurements for each iris variable.
Any help with this would be greatly appreciated.
#------------------------------
Update
Trying to run the same code for a stratified sample as opposed to a random sample gives me the following error: *Error in [.data.table
(setDT(copy(iris)), as.vector(sapply(1:1000, function(X) stratified(iris, :
i is invalid type (matrix). Perhaps in future a 2 column matrix could return a list of elements of DT * I think this might be related to setDT?
This is a result of the following code:
do.call(rbind,lapply(1:100, function(x) {
bias(
setDT(copy(iris))[as.vector(sapply(1:1000, function(X) stratified(iris,group="Species", size=1)))][
, lapply(.SD, mean), by=rep(c(1:1000),4), .SDcols=c(1:4)][,c(2:5)],
parameter=c(5,3,2,1),
type='relative'
)
}))
I looked into using the following code which was suggested:
get_samples <- function(n, sampsize=4) {
rbindlist(lapply(1:n, function(x) {
splitstackshape::stratified(iris, group="Species",sampsize)[, id:=x] }))[
, lapply(.SD, mean), by=.(Species, id)] }
I think I understand what this function is doing (selecting 4 stratified rows of iris, taking the means of each column by species), but I'm not sure how to apply it to the original question of doing it (4*1000)*100 to test the bias (I'm very new at this so apologies if I'm missing something obvious).
Here is one way to do that. I've made some minor changes to your code, and wrapped it in a function. Then, use lapply
over a sequence say 1:10
or 1:100
, each time running your function, and feeding the result to your bias
function from the SimDesign
package. Then row bind the resulting list
library(dplyr)
get_samples <- function(df, size=4, n=1000) {
storage<- list()
counter<- 0
for (i in 1:1000) {
tempsample<- df[sample(1:nrow(df), size, replace=F),]
storage[[i]]=tempsample
counter<- counter+1
}
results<- do.call(rbind, storage)
results_2<- as.data.frame(results)
results_2<- results_2 %>% mutate(Aggregate = rep(seq(1,ceiling(nrow(results_2)/size)),each = size))
final_results<- aggregate(results_2[,1:size], list(results_2$Aggregate), mean)
return(final_results)
}
iris=iris
replicates = lapply(1:10, function(x) {
result = get_samples(iris)
(bias(result[,2:5], parameter=c(5,3,2,1), type='relative'))*100
})
replicates = do.call(rbind, replicates)
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width
[1,] 41.50617 3.292500 86.77408 8.859333
[2,] 43.26058 2.763500 90.20758 10.825917
[3,] 43.46642 3.551750 90.11767 10.576250
[4,] 41.94683 2.970833 86.89625 8.817000
[5,] 42.08733 3.380917 86.78642 8.996667
[6,] 42.13050 2.942250 88.02983 9.707500
[7,] 43.07383 2.775500 89.04583 10.102083
[8,] 44.10192 2.895167 91.27208 11.188500
[9,] 41.29408 2.314750 87.59208 9.244333
[10,] 42.77450 2.781583 90.37342 10.789500
library(SimDesign)
library(data.table)
do.call(rbind,lapply(1:100, function(x) {
bias(
setDT(copy(iris))[as.vector(sapply(1:1000, function(X) sample(1:nrow(iris),4)))][
, lapply(.SD, mean), by=rep(c(1:1000),4), .SDcols=c(1:4)][,c(2:5)],
parameter=c(5,3,2,1),
type='relative'
)
}))