rfor-loopdplyr

Use a for loop for a mutate function with an operation


I was trying to use a for loop to create new columns in a dataframe that I am working to develop new variables for analysis and I am finding a lot of problems doing so.

I have several variables that relate to parties' emphasis to several political issues and I wanted to create variables for every issue taking into account the vote share the party received (so as to determine an approximation of how much of the vote share relates to emphasizing an issue i by x party).

Hence, the new columns would apply the following operation: i_emphasis * pervote.

The code I wrote derives from another response I checked in Stack since my first couple of attempts went quite poorly, yet I still fail to see what is going on. Here is the code at the moment:

vars <- c(y2016_ESP$per101:y2016_ESP$per706)

y2016_ESP %>% 
  for (i_var in vars){
      i_emphasis <- paste0("supp_i_",i_var)
  
      mod_y2016_ESP <- y2016_ESP %>%
        mutate(!!sym(i_emphasis) := i_var*pervote)
  }

The i_var is the vector containing all the columns of "issue emphasis" that it should iterate and it appears to be the one giving problems. I do not why since c() should take every single column and include it in the vector (at least, I think it should do that).

Note: For the current code, I am just applying it to a reduced dataframe since I wanted to test it and check descriptively this small df first and then apply it generally to the broader dataframe. I have seen that some other people use functions apply() or lapply() for doing something similar, but I am more familiarized with for loops typical in programming languages.


Solution

  • I think the following is what you're going for.

    library(dplyr)
    y2016_ESP <- data.frame(
      per101 = 1:5,
      per501 = rnorm(5),
      per706 = runif(5)
      )
    
    pervote <- .5
    
    # mutate creates new columns
    y2016_ESP |>
      dplyr::mutate(
        dplyr::across(per101:per706, # tidy select all columns between `per101` and `per706`
                      ~ .x * pervote, # dplyr style anonymous function
                      .names = "supp_i_{.col}") # glue style naming of new columns
        )
    

    I think your confusion comes from mixing together base R and dplyr's tidy-select concepts.

    vars <- c(y2016_ESP$per101:y2016_ESP$per706) doesn't really make sense in base R, but that style is one way to select columns using dplyr using across.

    glue in the comment above refers to the R glue package. It often operates similar to Python's f strings.

    You generally want to choose between base R style or dplyr style, and avoid too much mixing of the concepts. (Of course, this isn't always true.)

    I recommend the book R for Data Science if you will be working with data.frames a lot.