rdata.tablefrequency

What is the most simple way to label a data.table frequency chart in R?


I'm curious as to what the easiest way to label a data.table frequency table is. For example, say I have a data.table (dt1) with the following column (animals) being c(1,1,2,2,2,2,3,3,3,3,3,3)

and I use:

dt2 <- dt1[, .N ,by = animals]

to get a frequency table (dt2):

animals N
1: 1 2
2: 2 4
3: 3 6

what's the most elegant way to label (rename) the animal column if

1 = turtle
2 = horse
3 = cat
4 = dog

This is easy to do by reference if there is no 4/dog:

dt2[animals == c(1:3), animalnames := c("turtle", "horse", "cat")]

However, this presents two issues:

  1. it creates a new column meaning the "animals" column needs to be dealt with, not a big deal, but it would be nice to have a more elegant solution
  2. the inclusion of "dog", an element that doesn't exist in the "animals" column causes an error:
dt2[animals == c(1:4), animalnames := c("turtle", "horse", "cat", "dog")]

Error in .prepareFastSubset(isub = isub, x = x, enclos = parent.frame(),  : 
  RHS of == is length 4 which is not 1 or nrow (3). For robustness, no recycling is allowed (other than of length 1 RHS). Consider %in% instead.

What's the simplest solution if you: a) Want to relabel the "animals" column b) want to be able to use a list of possible labels that can, but may not exist in a given sample (i.e. another sample might contain dog, and another might not contain cat, but I want to use the same code for all samples)

Thanks!


Solution

  • You could use fcase:

    dt2[,animals := fcase( animals == 1, 'turtle',
                           animals == 2, 'horse',
                           animals == 3, 'cat',
                           animals == 4, 'dog',
                           rep(TRUE,.N), 'unknown')]
    dt2
    
    #   animals     N
    #    <char> <int>
    #1:  turtle     2
    #2:   horse     4
    #3:     cat     6