sqldrizzle

How to avoid SQL fanout in this situation?


I may be incorrect in saying it's a fanout, but I think that's what it is. To start with, I'm using Drizzle ORM. Essentially, I have 4 tables in question: one is a bills table, another is a UsersToBills junction table, the third is the users table, and the fourth is an expenses table(I don't have a junction table for expenses because I store the billId inside the Expenses schema). To keep it brief, a bill can have both multiple users and multiple expenses associated with it. What I'm trying to do is query an array of all the bills, where each bill has its information along with the arrays of both users and expenses associated with it: In pseudocode, this is my desired structure:

Bill {
     // bill info skipped for brevity
     users: User[],
     expenses: Expense[]
}[]

Now, here's the problem: lets say a bill has two users and one expense associated with it. The way this query currently works, it correctly returns the two users, but then it also returns 2 copies of the one expense (I'm guessing that's because there are two users?) Obviously, what I would want, is for it to only return the unique number of expenses. Anyways, any help with getting this query correct would be greatly appreciated, The query is attached directly below. Thanks.

bill = await db
.select({
  id: billsTable.id,
  createdBy: billsTable.createdBy,
  createdAt: billsTable.createdAt,
  name: billsTable.name,
  balance: billsTable.balance,
  users: sql<
    User[]
  >`json_agg(json_build_object('id', ${usersTable.id}, 'email', ${usersTable.email}, 
              'name', ${usersTable.name}, 'image', ${usersTable.image}))`,
  expenses: sql<
    Expense[]
  >`json_agg(json_build_object('id', ${expensesTable.id}, 'amountSpent', ${expensesTable.amountSpent},
            'createdAt', ${expensesTable.createdAt}, 'createdBy', ${expensesTable.createdBy},
            'note', ${expensesTable.note})) FILTER (WHERE ${expensesTable.id} IS NOT NULL)`,
})
.from(billsTable)
.leftJoin(billsToUsers, eq(billsTable.id, billsToUsers.billId))
.leftJoin(usersTable, eq(billsToUsers.userId, usersTable.id))
.leftJoin(expensesTable, eq(expensesTable.billId, billsTable.id))
.where(
  and(
    eq(billsTable.id, billsToUsers.billId),
    eq(billsTable.id, params.billId)
  )
)
.groupBy(billsTable.id);


Solution

  • The problem here is a pretty common mistake: you are joining everything back to billsTable and only aggregating afterwards, so you end up with a cross-join of all one-to-many and many-to-many relationships

    I don't know this ORM, but the idea is generally that you would pre-aggregate before joining.

    In raw SQL that would be something like

    select
      b.id,
      b.createdBy,
      b.createdAt,
      b.name,
      b.balance,
      u.users,
      e.expenses
    from billsTable b
    left join (
        select
          bu.billId,
          json_agg(
            json_build_object(
              'id', u.id,
              'email', u.email, 
              'name', u.name,
              'image', u.image
            )
          ) as users
        from billsToUsers bu
        join usersTable u on bu.userId = u.id
        group by
          bu.billId
    ) u on b.id = u.billId
    left join (
        select
          e.billId,
          json_agg(
            json_build_object(
              'id', e.id,
              'amountSpent', e.amountSpent,
              'createdAt', e.createdAt,
              'createdBy', e.createdBy,
              'note', e.note
            )
          ) as expenses
        from expensesTable e
        group by
          e.billId
    ) e on e.billId = b.id
    where b.id = @billId;
    

    Depending on the DBMS you can also do a lateral join instead, again putting the grouping inside the subquery.