sqlpostgresqlaggregate-functionswindow-functionsaggregate-filter

Subquery with the same WHERE clause as query


In Postgres 9.4, I have a table which contains user's emails sent by different senders. I need to generate the list of the recent distinct senders, I'm using following query:

SELECT DISTINCT ON (tableA.senderName) 
       tableA.senderName,tableA.received,tableA.someOtherColumn
FROM tableA 
WHERE tableA.received BETWEEN timeA AND timeB
ORDER BY tableA.senderName,tableA.received DESC

Since tableA contains all messages of all senders, this query gives me only the recent messages of the distinct senders and works quite well.

Actually, I have the whole query (except the WHERE clause, since timeA and timeB differ everytime) in a view and then I just SELECT * from this view using the particular timeA and timeB in the WHERE clause.

Now consider the following query:

SELECT SUM(ts_rank_cd(tableA.precompTSVector,constantTSQuery)) AS score 
FROM tableA 
WHERE tableA.precomTSVector @@ constantTSQuery
AND tableA.sender = someParticularSender
AND tableA.received BETWEEN timeA AND timeB

which gives the total score of searching some constant TS query against all messages of a particular sender within the given time range.

I would like to incorporate this query (perhaps as a subquery) into my view so that I can also see the "score" for each distinct sender. The obvious problem is that the "score query" is dependent on the actual timeA and timeB in the WHERE clause. These times are not known until the view itself is queried.

EDIT: A different problem is the someOtherColumn being also selected in the first query (and potentially some other columns which I would also like to incorporate in the view) - since the an aggregate function is being used.

Any ideas how to achieve this? Or perhaps some different approach is welcome!


Solution

  • SELECT DISTINCT ON (senderName) 
           senderName, received -- add more columns as you please
         , SUM(ts_rank_cd(precompTSVector,constantTSQuery)) 
             FILTER (WHERE precomTSVector @@ constantTSQuery)
             OVER (PARTITION BY senderName) AS score 
    FROM   tableA
    WHERE  received BETWEEN timeA AND timeB
    ORDER  BY senderName, received DESC;
    

    First you need to get DISTINCT ON right:

    DISTINCT ON is applied after aggregate functions, that's why you can combine both in a single SELECT. Consider the sequence of events in a SELECT query:

    About the new aggregate FILTER clause (requires Postgres 9.4):

    You need to run this as window function, if you want to add more (unaggregated) columns.

    If received can be NULL, you'll want to use DESC NULLS LAST:

    Depending on data distribution and cardinalities, other query techniques may or may not be faster: