rlapplysapply

sapply extra things being printed


I have a vector in R which is like this: vec <- c('abc','def')

Here is my dummy input data frame:

Gen value
abc 12
def 34
abc 12
abc 13
def 1
abc 4
abc 6
ghi 23

I am using sapply on this vec like this:

vec <- c('abc', 'def')

sapply(vec, function(x){
    print(x)
    df_mut <- data[data$Gen == x,]
    df_nonmut <- data[data$Gen != x,]
    print(nrow(df_mut))
    print(nrow(df_nonmut))
})

This gives me below output

[1] "abc"
[1] 703
[1] 715
[1] "def"
[1] 251
[1] 1167
abc: 715 def: 1167

Why am I getting the line abc: 715 def: 1167

Also, I want to add these values in a data frame so it looks like this:

gene    mut nonmut
abc 703 715
def 251 1167

How can I achieve that?


Solution

  • Why am I getting the line abc: 715 def: 1167?

    I think two things are happening that might be confusing to you:

    1. print invisibly returns its arguments
    2. sapply with its default values will return a named vector here

    Here is an example of #1

    y <- print(1) # notice there is now an 'y' in your environment
    # [1] 1 # this is printed to the console as a side effect from print
    
    y
    # [1] 1
    

    Here is an example of #2

    sapply(vec, \(x) 1)
    # abc def 
    #   1   1 
    

    Putting these two pieces together:

    sapply(vec, \(x) {print('first print'); print('second print')})
    # [1] "first print"
    # [1] "second print"
    # [1] "first print"
    # [1] "second print"
    #            abc            def 
    # "second print" "second print" 
    

    With this in mind you can see what is happening in your code. The last print statement of your first iteration returns 715. The last print statement of your second iteration returns 1167. Since those values are being invisibly returned those are the outputs from your iterations. Then sapply, in an attempt to simplify the output, returns a named vector. So that last output is not being printed, per se. It's the output of your function which you have not assigned to anything:

    # notice assignment
    output <- sapply(vec, function(x){
      print(x)
      df_mut <- data[data$Gen == x,]
      df_nonmut <- data[data$Gen != x,]
      print(nrow(df_mut))
      print(nrow(df_nonmut))
    }
    )
    # [1] "abc"
    # [1] 5
    # [1] 3
    # [1] "def"
    # [1] 2
    # [1] 6
    
    # result of assignment
    output
    # abc def 
    #   3   6 
    

    How can I achieve that?

    @Ben Bolker already has a base R answer. This is the same answer using dplyr from the tidyverse (just as an alternative option):

    library(dplyr)
    
    data |> 
      filter(Gen %in% vec) |>
      count(Gen, name = "mut") |> 
      mutate(non_mut = nrow(data) - mut)