I have a classification problem I need to solve using R, but to be sincere I have no clue on how to do it.
I have a table (see below) where different samples are classified by three ML models (one per column), and I need to choose the "most voted" category for each case and write it to a new column.
Current table
Desired Output
I have been reading about categorical variables in R, but anything seem to fit my specific needs.
Any help would be highly appreciated.
Thanks in advance.
JL
This is not how you ask a question. Please see the relevant thread, and in the future offer the data in the form shown below (using dput()
and copy and paste the result from the console). At any rate here is a base R solution:
# Calculate the modal values: mode => character vector
df1$mode <- apply(
df1[,colnames(df1) != "samples"],
1,
function(x){
head(
names(
sort(
table(x),
decreasing = TRUE
)
),
1
)
}
)
Data:
df1 <- structure(list(samples = c("S1", "D4", "S2", "D1", "D2", "S3",
"D3", "S4"), RFpred = c("Carrier", "Absent", "Helper", "Helper",
"Carrier", "Absent", "Resistant", "Carrier"), SVMpred = c("Absent",
"Absent", "Helper", "Helper", "Carrier", "Helper", "Helper",
"Resistant"), KNNpred = c("Carrier", "Absent", "Carrier", "Helper",
"Carrier", "Absent", "Helper", "Resistant"), mode = c("Carrier",
"Absent", "Helper", "Helper", "Carrier", "Absent", "Helper",
"Resistant")), row.names = c(NA, -8L), class = "data.frame")