sqlstring-aggregationduckdb

Using string_agg to create distinct groups by row


Consider 'Table 1' below:

id x
1 A
1 B
1 C
2 B
2 C
3 A

I want to query Table 1 to produce the following:

id x grps
1 A B, C
1 B A, C
1 C A, B
2 B C
2 C B
3 A

Using string_agg and grouping by id produces "A, B, C" for all id 1 rows, and I want to exclude the x value from grps for that row (where x = "A", grps should equal "B, C", etc.) Is there some self-join/string_agg/grouping trick?

Edit: My current query is below:

library(duckdb)
#> Loading required package: DBI
df <- tibble::tibble(
  id = c(1, 1, 1, 2, 2, 3),
  x = c("A", "B", "C", "B", "C", "A")
)

con <- dbConnect(duckdb())
duckdb_register(con, "tab1", df)
q <- "
WITH tab_grps AS (
  SELECT id,
         string_agg(x, ',') AS grps
  FROM tab1
  GROUP BY id
)
SELECT tab1.id,
       tab1.x,
       tab_grps.grps
FROM tab1
LEFT JOIN tab_grps ON tab1.id = tab_grps.id 
"
dbGetQuery(con, q)
#>   id x  grps
#> 1  1 A A,B,C
#> 2  1 B A,B,C
#> 3  1 C A,B,C
#> 4  2 B   B,C
#> 5  2 C   B,C
#> 6  3 A     A
dbDisconnect(con)

Created on 2024-05-22 with reprex v2.0.2


Solution

  • It sounds like you may want a Window Function with an EXCLUDE CURRENT ROW clause.

    duckdb.sql("""
    from (values 
       (1, 'A'), (1, 'B'), (1, 'C'), 
       (2, 'B'), (2, 'C'), (3, 'A')
    ) t(id, x)
    
    select
       *,
       string_agg(x) over(
          partition by id
          order by id
          rows between unbounded preceding and unbounded following  
          exclude current row
       ) as grps
    """)
    
    ┌───────┬─────────┬─────────┐
    │  id   │    x    │  grps   │
    │ int32 │ varchar │ varchar │
    ├───────┼─────────┼─────────┤
    │     1 │ A       │ B,C     │
    │     1 │ B       │ A,C     │
    │     1 │ C       │ A,B     │
    │     2 │ B       │ C       │
    │     2 │ C       │ B       │
    │     3 │ A       │ NULL    │
    └───────┴─────────┴─────────┘