rvotingcategorical

How to choose the most voted category from multiple columns in R


I have a classification problem I need to solve using R, but to be sincere I have no clue on how to do it.

I have a table (see below) where different samples are classified by three ML models (one per column), and I need to choose the "most voted" category for each case and write it to a new column.

Current table

enter image description here

Desired Output

enter image description here

I have been reading about categorical variables in R, but anything seem to fit my specific needs.

Any help would be highly appreciated.

Thanks in advance.

JL


Solution

  • This is not how you ask a question. Please see the relevant thread, and in the future offer the data in the form shown below (using dput() and copy and paste the result from the console). At any rate here is a base R solution:

    # Calculate the modal values: mode => character vector
    df1$mode <- apply(
      df1[,colnames(df1) != "samples"],
      1,
      function(x){
        head(
          names(
            sort(
              table(x), 
              decreasing = TRUE
            )
          ),
         1
        )
      }
    )
    

    Data:

    df1 <- structure(list(samples = c("S1", "D4", "S2", "D1", "D2", "S3", 
    "D3", "S4"), RFpred = c("Carrier", "Absent", "Helper", "Helper", 
    "Carrier", "Absent", "Resistant", "Carrier"), SVMpred = c("Absent", 
    "Absent", "Helper", "Helper", "Carrier", "Helper", "Helper", 
    "Resistant"), KNNpred = c("Carrier", "Absent", "Carrier", "Helper", 
    "Carrier", "Absent", "Helper", "Resistant"), mode = c("Carrier", 
    "Absent", "Helper", "Helper", "Carrier", "Absent", "Helper", 
    "Resistant")), row.names = c(NA, -8L), class = "data.frame")