rdata.table

R data.table replace values in multiple columns


I am new to R and new to stackoverflow. I am trying to figure out data.table and looked at "R data.table replacing an index of values from another data.table" and thought I understood but can't get what I want.

I have two data frames- the first is the data I'm interested in and the second is a key containing names/ids to translate the ids in the first data frame. I would like to use the "key" data.table to convert the numbers in table$id1 and table$id2 into the "Names" in the "key" data.table. Here's what I've managed so far:

table<-data.table("Sample" = sample(40:46, 6), "Conc1" = sample(100:106,6), 
              "id1" = as.character(sample(1:6, 6)), "Conc2" = sample(200:206,6),
              "id2" = as.character(sample(1:6, 6))) 

key<-data.table("Name" = c("Sally", "John", "Roger", "Bob", "Kelsey", "Molly"), 
            "id1" = as.character(1:6))

setkey(table, id1)
setkey(key, id1)

table[key, `:=`(id1 = i.Name)]

I've gotten this far (substituted values for names in table$id1) but can't figure out how to also change id2 without changing the column names, resetting the keys and re-doing the same step above for id2. In the real data set, there will be multiple Sally's, John's etc. and I want the code to "translate" both columns using the same key.

Hoping the code uses data.table (for learning purposes) but if there is another package that will do this better, that would be great too. Thanks!


Solution

  • In data.table you don't need to set the keys to do a join. You can specify the join columns in the on= argument.

    And as of data.table v1.9.6 you can join on different column names using the on= argument.

    library(data.table) ## v1.9.6 +
    
    ## update id1 based on Name
    table[ key, on = c("id1"), nomatch = 0, id1 := i.Name]
    ## here the id1 column is getting updated to i.Name 
    ## (the 'i.' is the prefix given to columns on the 'right' side of the join).
    
    ## update id2 based on Name
    table[ key, on = c(id2 = "id1"), nomatch = 0, id2 := i.Name]
    
    table
    
    #   Sample Conc1    id1 Conc2    id2
    #1:     40   100   John   201   John
    #2:     43   101 Kelsey   206 Kelsey
    #3:     45   103  Molly   205  Roger
    #4:     42   102  Roger   204    Bob
    #5:     44   104  Sally   200  Molly
    #6:     41   105    Bob   202  Sally
    

    Data

    ## setting seed because we are sampling
    set.seed(1234)
    table<-data.table("Sample" = sample(40:46, 6), "Conc1" = sample(100:106,6), 
                      "id1" = as.character(sample(1:6, 6)), "Conc2" = sample(200:206,6),
                      "id2" = as.character(sample(1:6, 6))) 
    
    key<-data.table("Name" = c("Sally", "John", "Roger", "Bob", "Kelsey", "Molly"), 
                    "id1" = as.character(1:6))