Apologies if the title of the post is a bit confusing. Let's say I have the following data frame:
set.seed(123)
test <- data.frame("chr" = rep("chr1",30), "position" = sample(c(1:50), 30, replace = F) ,
"info" = sample(c("X","Y"), 30, replace = T),
"condition"= sample(c("soft","stiff"), 30, replace = T) )
## head(test)
chr position info condition
1 chr1 31 Y soft
2 chr1 15 Y soft
3 chr1 14 X soft
4 chr1 3 X soft
5 chr1 42 X stiff
6 chr1 43 X stiff
I want to bin the position
column. Let's say with a size of 10. Then based on the condition (either soft or stiff), I would like to count the occurrences in the info
column. So the data would look something like this (not the actual result from the data above)
chr start end condition count_Y count_X
1 chr1 1 10 soft 2 3
2 chr1 1 10 stiff 0 2
3 chr1 11 20 soft 2 5
4 chr1 11 20 soft 1 2
5 chr1 21 30 soft 2 0
6 chr1 21 30 stiff 0 4
To make it easier, it is probably better to create two data frames based on condition and then apply the binning and counting, but I am stuck on this part. Any help is appreciated. Many thanks.
Using cut
or even easier using integer division %/%
for the binning (Thx to @MrFlick for the hint), dplyr::count
and tidyr::pivot_wider
you could do:
library(dplyr, warn=FALSE)
library(tidyr)
test |>
mutate(
bin = position %/% 10 + 1,
start = (bin - 1) * 10 + 1,
end = bin * 10
) |>
count(chr, start, end, condition, info) |>
tidyr::pivot_wider(
names_from = info,
values_from = n,
names_prefix = "count_",
values_fill = 0
)
#> # A tibble: 9 × 6
#> chr start end condition count_X count_Y
#> <chr> <dbl> <dbl> <chr> <int> <int>
#> 1 chr1 1 10 soft 4 0
#> 2 chr1 1 10 stiff 2 1
#> 3 chr1 11 20 soft 3 3
#> 4 chr1 21 30 soft 1 1
#> 5 chr1 21 30 stiff 3 1
#> 6 chr1 31 40 soft 0 2
#> 7 chr1 31 40 stiff 2 1
#> 8 chr1 41 50 soft 0 1
#> 9 chr1 41 50 stiff 4 1