I have a vector in R which is like this:
vec <- c('abc','def')
Here is my dummy input data frame:
Gen value
abc 12
def 34
abc 12
abc 13
def 1
abc 4
abc 6
ghi 23
I am using sapply on this vec
like this:
vec <- c('abc', 'def')
sapply(vec, function(x){
print(x)
df_mut <- data[data$Gen == x,]
df_nonmut <- data[data$Gen != x,]
print(nrow(df_mut))
print(nrow(df_nonmut))
})
This gives me below output
[1] "abc"
[1] 703
[1] 715
[1] "def"
[1] 251
[1] 1167
abc: 715 def: 1167
Why am I getting the line abc: 715 def: 1167
Also, I want to add these values in a data frame so it looks like this:
gene mut nonmut
abc 703 715
def 251 1167
How can I achieve that?
Why am I getting the line abc: 715 def: 1167
?
I think two things are happening that might be confusing to you:
print
invisibly returns its argumentssapply
with its default values will return a named vector hereHere is an example of #1
y <- print(1) # notice there is now an 'y' in your environment
# [1] 1 # this is printed to the console as a side effect from print
y
# [1] 1
Here is an example of #2
sapply(vec, \(x) 1)
# abc def
# 1 1
Putting these two pieces together:
sapply(vec, \(x) {print('first print'); print('second print')})
# [1] "first print"
# [1] "second print"
# [1] "first print"
# [1] "second print"
# abc def
# "second print" "second print"
With this in mind you can see what is happening in your code. The last print statement of your first iteration returns 715
. The last print statement of your second iteration returns 1167
. Since those values are being invisibly returned those are the outputs from your iterations. Then sapply
, in an attempt to simplify the output, returns a named vector. So that last output is not being printed, per se. It's the output of your function which you have not assigned to anything:
# notice assignment
output <- sapply(vec, function(x){
print(x)
df_mut <- data[data$Gen == x,]
df_nonmut <- data[data$Gen != x,]
print(nrow(df_mut))
print(nrow(df_nonmut))
}
)
# [1] "abc"
# [1] 5
# [1] 3
# [1] "def"
# [1] 2
# [1] 6
# result of assignment
output
# abc def
# 3 6
How can I achieve that?
@Ben Bolker already has a base R answer. This is the same answer using dplyr
from the tidyverse (just as an alternative option):
library(dplyr)
data |>
filter(Gen %in% vec) |>
count(Gen, name = "mut") |>
mutate(non_mut = nrow(data) - mut)