Consider 'Table 1' below:
id | x |
---|---|
1 | A |
1 | B |
1 | C |
2 | B |
2 | C |
3 | A |
I want to query Table 1 to produce the following:
id | x | grps |
---|---|---|
1 | A | B, C |
1 | B | A, C |
1 | C | A, B |
2 | B | C |
2 | C | B |
3 | A |
Using string_agg
and grouping by id produces "A, B, C" for all id 1 rows, and I want to exclude the x value from grps for that row (where x = "A", grps should equal "B, C", etc.) Is there some self-join/string_agg/grouping trick?
Edit: My current query is below:
library(duckdb)
#> Loading required package: DBI
df <- tibble::tibble(
id = c(1, 1, 1, 2, 2, 3),
x = c("A", "B", "C", "B", "C", "A")
)
con <- dbConnect(duckdb())
duckdb_register(con, "tab1", df)
q <- "
WITH tab_grps AS (
SELECT id,
string_agg(x, ',') AS grps
FROM tab1
GROUP BY id
)
SELECT tab1.id,
tab1.x,
tab_grps.grps
FROM tab1
LEFT JOIN tab_grps ON tab1.id = tab_grps.id
"
dbGetQuery(con, q)
#> id x grps
#> 1 1 A A,B,C
#> 2 1 B A,B,C
#> 3 1 C A,B,C
#> 4 2 B B,C
#> 5 2 C B,C
#> 6 3 A A
dbDisconnect(con)
Created on 2024-05-22 with reprex v2.0.2
It sounds like you may want a Window Function with an EXCLUDE CURRENT ROW clause.
duckdb.sql("""
from (values
(1, 'A'), (1, 'B'), (1, 'C'),
(2, 'B'), (2, 'C'), (3, 'A')
) t(id, x)
select
*,
string_agg(x) over(
partition by id
order by id
rows between unbounded preceding and unbounded following
exclude current row
) as grps
""")
┌───────┬─────────┬─────────┐
│ id │ x │ grps │
│ int32 │ varchar │ varchar │
├───────┼─────────┼─────────┤
│ 1 │ A │ B,C │
│ 1 │ B │ A,C │
│ 1 │ C │ A,B │
│ 2 │ B │ C │
│ 2 │ C │ B │
│ 3 │ A │ NULL │
└───────┴─────────┴─────────┘