So, I am working on a ground truth for my machine learning algorithm with R. 5 people classified pictures for me. See the df in the picture below.
]1flow.com/Dataframe.PNG)
The results seem to be quite depended on the classifier. What I want to do now, is finding a common Ground Truth. Therefore, I want to do an adjustment by majority for most cases. If at least 3 classifiers classified a picture with the same condition, I want the other two which are different, to switch to this majority condition. In the rare case where there is no majority condition, I want to recheck the picture and make the decision myself.
Below find a reproducible example
data <- data.frame(Pic = character(10), class1 = numeric(10), class2 = numeric(10), class3 = numeric(10), class4 = numeric(10),
class5 = numeric(10), check = numeric(10))
data$Pic <- 1:10
set.seed(1234)
data$class1 <- sample(1:5,5)
data$class2 <- sample(1:5,5)
data$class3 <- sample(1:5,5)
data$class4 <- sample(1:5,5)
data$class5 <- sample(1:5,5)
data[5,2:6] <- 5
data[1,5] <-4
data[9,5] <- 2
data$check <- ifelse(data$class1 == data$class2 & data$class1 == data$class3 &
data$class1 == data$class4& data$class1 == data$class5, "Good", "delta")
Any help on is warmly welcome
found the answer myself, this does the trick:
library(data.table)
setDT(df)[, c("Most_Frequent", "Count") := {tbl <- table(unlist(.SD))
.(names(tbl)[which.max(tbl)], max(tbl))}, by = Variable]