In Postgres 9.4, I have a table which contains user's emails sent by different senders. I need to generate the list of the recent distinct senders, I'm using following query:
SELECT DISTINCT ON (tableA.senderName)
tableA.senderName,tableA.received,tableA.someOtherColumn
FROM tableA
WHERE tableA.received BETWEEN timeA AND timeB
ORDER BY tableA.senderName,tableA.received DESC
Since tableA
contains all messages of all senders, this query gives me only the recent messages of the distinct senders and works quite well.
Actually, I have the whole query (except the WHERE
clause, since timeA
and timeB
differ everytime) in a view and then I just SELECT *
from this view using the particular timeA
and timeB
in the WHERE
clause.
Now consider the following query:
SELECT SUM(ts_rank_cd(tableA.precompTSVector,constantTSQuery)) AS score
FROM tableA
WHERE tableA.precomTSVector @@ constantTSQuery
AND tableA.sender = someParticularSender
AND tableA.received BETWEEN timeA AND timeB
which gives the total score of searching some constant TS query against all messages of a particular sender within the given time range.
I would like to incorporate this query (perhaps as a subquery) into my view so that I can also see the "score" for each distinct sender. The obvious problem is that the "score query" is dependent on the actual timeA
and timeB
in the WHERE clause. These times are not known until the view itself is queried.
EDIT: A different problem is the someOtherColumn
being also selected in the first query (and potentially some other columns which I would also like to incorporate in the view) - since the an aggregate function is being used.
Any ideas how to achieve this? Or perhaps some different approach is welcome!
SELECT DISTINCT ON (senderName)
senderName, received -- add more columns as you please
, SUM(ts_rank_cd(precompTSVector,constantTSQuery))
FILTER (WHERE precomTSVector @@ constantTSQuery)
OVER (PARTITION BY senderName) AS score
FROM tableA
WHERE received BETWEEN timeA AND timeB
ORDER BY senderName, received DESC;
First you need to get DISTINCT ON
right:
DISTINCT ON
is applied after aggregate functions, that's why you can combine both in a single SELECT
. Consider the sequence of events in a SELECT
query:
About the new aggregate FILTER
clause (requires Postgres 9.4):
You need to run this as window function, if you want to add more (unaggregated) columns.
If received
can be NULL, you'll want to use DESC NULLS LAST
:
Depending on data distribution and cardinalities, other query techniques may or may not be faster: