rstringcharacterlapplypaste

How to paste a character variable in a lapply function?


I asked participants to write three words and to assign a number to each word. Some words have been written by several participants, other words by only one participant. Now I'm trying to create a set of variables, one for each word mentioned by participants, containing the value that each participant has assigned to that word (if s/he wrote that word). I wrote a function to do this, but it does not give the expected output. I guess that this is due to the wrong interpretation, in the function, of my character vector allwords.

I created the following sample date to illustrate the issue.

data <- data.frame(
words1 = c("apple", "pear", "banana", "pear", "banana"),
words2 = c("pear", "banana", "pear", "banana", "cherry"),
words3 = c("banana", "ananas", "apple", "melon", "pear"),
value1 = c(2, 1, 2, 0, 1),
value2 = c(2, 0, 0, 2, 0),
value3 = c(0, 2, 2, 1, 1)
)
allwords <- c("apple", "pear", "banana", "ananas", "melon", "cherry")
attach(data)
head(data)
  words1 words2 words3 value1 value2 value3
1  apple   pear banana      2      2      0
2   pear banana ananas      1      0      2
3 banana   pear  apple      2      0      2
4   pear banana  melon      0      2      1
5 banana cherry   pear      1      0      1

I want to create a set of vectors, each one dedicated to one of the words in allwords, reporting the value that each participant assigned to that word (NA if no value assigned). This is the output I am trying to get:

apple pear banana ananas melon cherry
2     2    0      NA     NA      NA
NA    1    0      2      NA      NA 
2     0    2      NA     NA      NA
NA    0    2      NA     1       NA
NA    1    1      NA     NA      0

I wrote this function to achieve this

value.f <- function(y){
  values.w[[y]] <- NA
  value.var <- values.w[[y]]
  value.var[which(data$words1 == y)] <- data$value1
  value.var[which(data$words2 == y)] <- data$value2
  value.var[which(data$words3 == y)] <- data$value3
}
values.w <- list()
values.w <- lapply(allwords, value.f)
names(values.w)  <- c(allwords)

But what I get is, for each word, the content of data$value3. Basically, all "which" conditions are found true, but I do not understand why.

as.data.frame(values.w)

apple pear banana ananas melon cherry
0     0     0     0     0      0     
2     2     2     2     2      2 
2     2     2     2     2      2     
1     1     1     1     1      1     
1     1     1     1     1      1     

I do not understand what I am doing wrong, but I struggle a lot using character vectors in lapply() functions so I guess that this is the kind of issue I have here.

I tried with eval(parse(text=y), I tried with paste0(y), but none of these work.


Solution

  • One possibility. It uses {dplyr} which has a function bind_rows which can rowbind dataframes of varying variable composition without complaining that variables don't match (as rbind would).

    ##   words1 words2 words3 value1 value2 value3
    ## 1  apple   pear banana      2      2      0
    ## 2   pear banana ananas      1      0      2
    ## 3 banana   pear  apple      2      0      2
    ## 4   pear banana  melon      0      2      1
    ## 5 banana cherry   pear      1      0      1
    
        row_to_named_list <- \(row, itemcount = 3){
          setNames(row[1:itemcount + itemcount],
                   row[1:itemcount]
          )
        }
    
        library(dplyr)
    
        do.call(dplyr::bind_rows,
              lapply(1:5, \(r) row_to_named_list(data[r, ]))
        )
    

    output:

    ##   apple pear banana ananas melon cherry
    ## 1     2    2      0     NA    NA     NA
    ## 2    NA    1      0      2    NA     NA
    ## 3     2    0      2     NA    NA     NA
    ## 4    NA    0      2     NA     1     NA
    ## 5    NA    1      1     NA    NA      0
    

    edit

    Below is an adapted version of your function which works as expected. Major glitches were failure to return the function's result and overwriting value.var in the wrong positions, value3 overriding previous replacements.

    value.f <- function(y){
      ## `<-` can't change objects outside the function from within
      ## the function anyway:
      ## values.w[[y]] <- NA
      ## this would create a single-item value.var containing only NA
      ## value.var <- values.w[[y]]
      value.var <- rep(NA, 5) ## instantiate value.var as an NA-vector of desired length
      ## replacement values must have same length as replacement positions:
      value.var[which(data$words1 == y)] <- data$value1[which(data$words1 == y)]
      value.var[which(data$words2 == y)] <- data$value2[which(data$words2 == y)]
      value.var[which(data$words3 == y)] <- data$value3[which(data$words3 == y)]
      ## don't forget to return value.var!
      value.var
    }
    
    ## values.w <- list() # lapply returns a list anyway
    values.w <- lapply(allwords, value.f2)
    names(values.w)  <- c(allwords)
    
    list2DF(values.w) ## make this a dataframe