rdataframe

Create a new column with the column name of the max value for each row for a *selected* number of columns in R


I have a dataframe that contains the information regarding the number of votes that parties and their coalitions got for each town. I also have other columns (in total 44), like the name and codification of the town, the number of sections, among other columns.

What I want is, by selecting only a limited number of the total of columns, to create a new column in my data frame whose values are the column names of the max values for each row.

I have in total 44 columns, and I want the range of columns that go from the position 7 to 29, as well as some specific columns. So I want not only to do a range 7:29, but also to select the columns called "A" and "B" (for example). Does anybody know how I can do this?

I have tried with the code given in other answers, but they do not work. I've tried:

cdmx <- cdmx %>%
+   mutate(winner = names(cdmx)[max.col(cdmx[, c(7:29, "coal_pri_pvem", "coal_prd_pt_nva_alianza", "coal_prd_pt")], ties.method = "first")])

Yet this error occurs (even though I have 44 columns in total):

Error in `mutate()`:
ℹ In argument: `winner = ...[]`.
ℹ In row 1.
Caused by error in `cdmx[, c(7:29, "coal_pri_pvem",
  "coal_prd_pt_nva_alianza", "coal_prd_pt")]`:
! Can't subset columns that don't exist.
✖ Columns `7`, `8`, `9`, `10`, `11`, etc. don't exist.

I have also tried with the following code:

cdmx$winner <- colnames(cdmx)[apply(cdmx[, c(7:num_columnas, "coal_pri_pvem", "coal_prd_pt_nva_alianza", "coal_prd_pt")], 1, function(x) which.max(x, na.rm = TRUE))]

Yet the same error happens:

Error in cdmx[, c(7:29, "coal_pri_pvem",
  "coal_prd_pt_nva_alianza", "coal_prd_pt")]:
! Can't subset columns that don't exist.
✖ Columns 7, 8, 9, 10, 11, etc. don't exist.

I've looked in many sites yet I've found no solution. Does anybody know how I can fix this? Thanks!


Solution

  • Since you have not provided any data it will be difficult to give you an exact answer but maybe this will give you a hint.

    Firstly, you cannot subset dataframes with indexes and names, you have to choose one of the either method.

    You can do mtcars[, c(3:6)] or mtcars[, c("gear", "hp")] but you cannot mix the two i.e mtcars[, c(3:6, "gear", "hp")] would return an error. So in your case, cdmx[, c(7:29, "coal_pri_pvem", "coal_prd_pt_nva_alianza", "coal_prd_pt")] is not correct. If you cannot hardcode the column numbers for those columns you may find them out using match.

    Secondly, you should be subsetting only those column names that you are selecting. In your code you are subsetting entire colnames(cdmx) or names(cdmx).

    Try the following -

    cols <- c("coal_pri_pvem", "coal_prd_pt_nva_alianza", "coal_prd_pt")
    col_index <- match(cols, names(cdmx))
    all_index <- unique(c(7:29, col_index))
    cdmx$winner <- names(cdmx)[all_index][max.col(cdmx[, all_index], ties.method = "first")]