rdplyrrenamesurveymonkey

Renaming Columns by Pasting First Row if not NA


I am looking to standardize cleaning exports from Survey Monkey at my organization, and want to rename the column name to (Column name + first row name) if the first row is not NA.

Edit: This would ideally be implemented in a function/loop so that it would work on data frames of varying sizes, without having to edit any parameters.

Reprex:

df <- tribble(
  ~`Which of these choices do you like`, ~`...1`, ~`...2`, ~`...3`, ~`Respondent ID`, ~`Different Text`, ~`...4`,
  'Fruit', 'Drink', 'Dessert', 'Snack', NA, 'Pizza Topping', 'Pizza Style',
  'Apple', 'Water', 'Pie', 'Oreos', 1234, 'Mushroom', 'Deep Dish',
  'Apple', 'Coffee', 'Cake', 'Granola', 1235, 'Onion', 'NY Style',
  'Banana', 'Coffee', 'Pie', 'Oreos', 1236, 'Mushroom', 'NY Style',
  'Pear', 'Vodka', 'Pie', 'Granola', 1237, 'Onion', 'Deep Dish'
)

After the columns are renamed, I would delete the first row and carry on with my life.

Ideally, my df would look like this:

enter image description here

Thank you for any guidance!


Solution

  • In base R, we can use paste and then remove the first row

    names(df)[1:4] <- paste0(names(df)[1], unlist(df[1, 1:4]))
    df <- df[-1, ]
    

    Or using sprintf

    names(df)[1:4] <- sprintf("%s (%s)", names(df)[1], unlist(df[1, 1:4]))
    df <- df[-1,]
    

    If we want to do this by checking the NA elements

    library(dplyr)
    library(tidyr)
    library(purrr)
    library(stringr)
    keydat <- df %>%
              slice(1) %>% 
              select_if(negate(is.na)) %>%
              pivot_longer(everything()) %>%
              group_by(grp = cumsum(!startsWith(name, "..."))) %>% 
              mutate(value = sprintf("%s (%s)", first(name), value)) %>% 
              ungroup %>% 
              select(-grp)
    
    
    df <- df %>%
            rename_at(vars(keydat$name), ~ keydat$value) %>%
            slice(-1)
    
    df
    # A tibble: 4 x 7
    #  `Which of these… `Which of these… `Which of these… `Which of these… `Respondent ID`
    #  <chr>            <chr>            <chr>            <chr>                      <dbl>
    #1 Apple            Water            Pie              Oreos                       1234
    #2 Apple            Coffee           Cake             Granola                     1235
    #3 Banana           Coffee           Pie              Oreos                       1236
    #4 Pear             Vodka            Pie              Granola                     1237
    # … with 2 more variables: `Different Text (Pizza Topping)` <chr>, `Different Text (Pizza
    #   Style)` <chr>
    
    names(df)
    #[1] "Which of these choices do you like (Fruit)"   "Which of these choices do you like (Drink)"  
    #[3] "Which of these choices do you like (Dessert)" "Which of these choices do you like (Snack)"  
    #[5] "Respondent ID"                                "Different Text (Pizza Topping)"              
    #[7] "Different Text (Pizza Style)"