sqlpostgresqlgogo-gormsql-in

What is the most performant way to rewrite a large IN clause?


I wrote an API using go and gorm that runs calculations on our database and returns the results.

I just hit the parameter limit for an IN condition when using an aggregate. Example query:

SELECT SUM(total_amount) from Table where user_id in(...70k parameters) group by user_id

One of my current edge cases has > 65535 user ids so my Postgres client is throwing an error:

got 66037 parameters but PostgreSQL only supports 65535 parameters

I'm not sure what the best way to approach this is. One that will handle the large amount of parameters for this edge case while not affecting my typical use case. Do I chunk the ids and iterate through multiple queries storing it in memory until I have all the data I need? Use ANY(VALUES)...

Obviously from the query I have very limited knowledge of Postgres so any help would be incredibly appreciated.


Solution

  • You can replace user_id IN (value [, ...]) with one of:

    user_id IN (subquery)
    user_id = ANY (subquery)
    user_id = ANY (array expression)
    

    Neither subqueries nor arrays exhibit the same limitation. The shortest input syntax would be:

    user_id = ANY ('{1,2,3}'::int[])  -- make array type match type of user_id
    

    Details and more options:

    Or you might create a (temporary) table tmp_usr(user_id int), import to it, maybe with SQL COPY or psql \copy instead of INSERT for best performance with big sets and then join to the table like:

    SELECT user_id, SUM(total_amount)
    FROM   tbl
    JOIN   tmp_usr USING (user_id)
    GROUP  BY user_id;
    

    BTW, you'll want to include user_id in the SELECT list to identify sums.