I'm a newbie to data.table
. I'm curious as to when the .SDcols
parameter content was processed in the case below? As per the documentation, the value
information should not be passed in .SD
, and since I have only provided v1 data in .SDcols
. So, theoretically it would report an error only? I'm not really understanding.
library(data.table)
dt <- data.table(
group = c("A", "A", "B", "B", "B"),
value = c(3, 6, 1, 2, 4),
v1 = c(1,2,3,4,5)
)
dt[, .SD[value == min(value)], by = group, .SDcols = "v1"]
#> group v1
#> <char> <num>
#> 1: A 1
#> 2: B 3
Created on 2025-06-25 with reprex v2.1.1
One way I would guess to handle this is:
by
first.SD
.SDcols
Looking forward to the clarification, thanks!
Let's see if we can dive into the process step by step
Content of .SD
by group
dt[, by=group,.SD, .SDcols = "v1"]
group v1
<char> <num>
1: A 1
2: A 2
3: B 3
4: B 4
5: B 5
OK normal, lets add value
now.
dt[, by=group, cbind(value, .SD), .SDcols = "v1"]
group value v1
<char> <num> <num>
1: A 3 1
2: A 6 2
3: B 1 3
4: B 2 4
5: B 4 5
Being able to do that means that columns are available as well as . SD
in J
scope. Let's add filter condition.
dt[, by=group, cbind(filter=value==min(value), .SD), .SDcols = "v1"]
group filter v1
<char> <lgcl> <num>
1: A TRUE 1
2: A FALSE 2
3: B TRUE 3
4: B FALSE 4
5: B FALSE 5
Pretty easy to see what's going to happen now :-)
dt[, .SD[value == min(value)], by = group, .SDcols = "v1"]
group v1
<char> <num>
1: A 1
2: B 3
So it's more
grouping is done based on by
first
.SD
is built from current group row subset keeping only .SDcols
and "added" to it