I have a df with data from a qPCR run:
df_1 <- structure(list(
row = c("A", "A", "A", "A", "B", "B"),
column = c(17L, 18L, 19L, 20L, 17L, 18L),
Treatment = c("Clp-1", "Clp-1","Clp-1", "Clp-1", "Clp-1", "Clp-1"),
Time = c("1h", "1h", "1h", "1h", "1h", "1h"),
Sample_Nr = c("1.1", "1.1", "1.1", "1.1", "1.2", "1.2"),
Target_Name = c("ClP-1", "ClP-1", "ClP-1", "ClP-1", "ClP-1", "ClP-1"),
Task = c("UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN","UNKNOWN"),
Reporter = c("SYBR", "SYBR", "SYBR", "SYBR", "SYBR", "SYBR"),
CT = c(30.7594337463379, 29.7701301574707,31.2958374023438,
29.883508682251, 28.765043258667, 28.3563442230225)),
row.names = c(NA, 6L), class = "data.frame")
This is an example from the df
I'm trying to find the n-1 closest Ct values based on the criteria "Sample_Nr" & "Target_Name" to calculate their average for downstream analysis.
I found this solution online so far:
n = 4
df_1 <- df %>% group_by(Sample_Nr,Target_Name, Treatment, Time) %>%
count("CT") %>% do(data.frame(findClosest(.$CT,n)))
Based on: How to find the three closest (nearest) values within a vector?
My Problem now is that "n" is a fixed value but sometimes I have just three Ct values instead of four of each technical replicate (The missing one will be a "NA" in the df). In such a case the findClosest()
function can't be applied to the df as the n by default would be 4. (Usually four technical replicates per condition).
How can I still use this function but adjusted to the number of Ct values I have for each condition?
So far I've tried the following:
a = df %>% group_by(Sample_Nr,Target_Name, Treatment, Time) %>% filter(!is.na(CT))
Vector_df1<−c(table(a$Sample_Nr, a$Target_Name))
I tried to pass "Vector_df1" as my new "n" to findClosest()
but this doesn't work.
Error message:
There were 50 or more warnings (Show first 50 warnings using warnings())
Warning:
1: Unknown or uninitialised column:CT
.
2: In 0:(n - 1) : numeric expression has 81 elements: only first one is used.
...
49: Unknown or uninitialised column:CT
.
50: In 0:(n - 1) : numeric expression has 81 elements: only first one is used.
PS:
I apologize if this post is too long or anything. I tried to be precise and include all relevant information. It's also my first post.
Here is a way. Change function findClosest
to check whether the vector length is not less than n
.
suppressPackageStartupMessages({
library(dplyr)
})
findClosest <- function(vec, n) {
require(zoo)
if(n > length(vec)) n <- length(vec)
vec1 <- sort(vec)
m1 <- rollapply(vec1, n, by = 1, function(i) c(sum(diff(i)), c(i)))
return(m1[which.min(m1[, 1]),][-1])
}
n <- 4
df_1 %>%
group_by(Sample_Nr, Target_Name) %>%
summarise(Closest = findClosest(CT, n), .groups = "drop")
#> Loading required package: zoo
#>
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#>
#> as.Date, as.Date.numeric
#> # A tibble: 6 × 3
#> Sample_Nr Target_Name Closest
#> <chr> <chr> <dbl>
#> 1 1.1 ClP-1 29.8
#> 2 1.1 ClP-1 29.9
#> 3 1.1 ClP-1 30.8
#> 4 1.1 ClP-1 31.3
#> 5 1.2 ClP-1 28.4
#> 6 1.2 ClP-1 28.8
Created on 2022-08-12 by the reprex package (v2.0.1)
To keep the n - 1
rows that minimize the variance of Closest
, I have written an auxiliary function smallest_var
. It computes the variances of the combinations of the n
elements of its input by groups of n-1
and returns the indices of the first minimum. Then those indices are matched to the row number and only the ones matching are filtered.
smallest_var <- function(x) {
n <- length(x)
if(n > 2) {
inx <- combn(seq_along(x), n - 1L)
v <- apply(inx, 2, \(i) var( x[i] ))
inx[, which.min(v) , drop = TRUE]
} else seq_along(x)
}
n <- 4
df_1 %>%
group_by(Sample_Nr, Target_Name) %>%
summarise(Closest = findClosest(CT, n)) %>%
filter(row_number() %in% smallest_var(Closest)) %>%
ungroup()
#> `summarise()` has grouped output by 'Sample_Nr', 'Target_Name'. You can
#> override using the `.groups` argument.
#> # A tibble: 5 × 3
#> Sample_Nr Target_Name Closest
#> <chr> <chr> <dbl>
#> 1 1.1 ClP-1 29.8
#> 2 1.1 ClP-1 29.9
#> 3 1.1 ClP-1 30.8
#> 4 1.2 ClP-1 28.4
#> 5 1.2 ClP-1 28.8
Created on 2022-08-12 by the reprex package (v2.0.1)