I use a query in the Stack Exchange Data Explorer (SEDE).
This is my query:
SELECT A.Id
, A.PostTypeId
, A.Title
, A.Body
, A.ParentId
, A.Tags
, A.CreationDate
FROM posts A
LEFT JOIN users U
ON A.OwnerUserId = U.id
WHERE U.Id = ##UserId##
AND A.PostTypeId = 1
UNION
SELECT A.Id
, A.PostTypeId
, A.Title
, A.Body
, A.ParentId
, B.Tags
, A.CreationDate
FROM posts A
LEFT JOIN users U
ON A.OwnerUserId = U.id
RIGHT JOIN posts B
ON A.ParentId = B.Id
WHERE U.Id = ##UserId##
AND A.PostTypeId = 2
In the code above, posts in Stack Overflow have 2 types: question and answer. Questions(PostTypeId
is 1 in database schema) have the tags, but the answers(PostTypeId
is 2 in database schema) do not have the tags.
Answers belong to questions through the ParentId
.
But the efficiency of the my query above is too low, I only can get some (using user id) posts' tags.
How can I get all users' posts' tags within the SEDE time out?
Several things:
Users
table but ID, then don't include that table. It chews up cycles and Posts.OwnerUserId
is the same thing.UNION
statements if possible (it is in this case).UNION
statements, use UNION ALL
if possible (it is in this case). This saves the engine from having to do duplicate checks.So, here is the execution plan for the original query:
Here is a streamlined plan:
And the query that corresponds to it:
SELECT TOP 50000
p.OwnerUserId AS [User]
, p.Id AS [Post Id]
, CASE WHEN p.PostTypeId = 1 THEN 'Q' ELSE 'A' END AS [Type]
, COALESCE (p.Title, ownr.Title) AS [Title]
, p.Body
, COALESCE (p.Tags, ownr.Tags) AS [Tags]
, p.CreationDate
FROM Posts p
LEFT JOIN Posts ownr ON ownr.ID = p.ParentId
WHERE p.OwnerUserId = ##UserId##
AND p.PostTypeId IN (1, 2) -- Question, answer
ORDER BY p.OwnerUserId DESC, p.CreationDate
-- which also gives more readable results -- especially when the WHERE
clause is removed.
But, if you can limit by, say, user before hand; you get an even more efficient query:
WITH usersOfInterest AS (
SELECT TOP 10
u.Id AS [UserId]
, u.DisplayName
FROM Users u
ORDER BY u.Reputation DESC
)
SELECT
[User] = 'site://u/' + CAST(u.UserId AS NVARCHAR) + '|' + CAST(u.UserId AS NVARCHAR)
, p.Id AS [Post Id]
, CASE WHEN p.PostTypeId = 1 THEN 'Q' ELSE 'A' END AS [Type]
, COALESCE (p.Title, ownr.Title) AS [Title]
, p.Body
, COALESCE (p.Tags, ownr.Tags) AS [Tags]
, p.CreationDate
FROM usersOfInterest u
INNER JOIN Posts p ON u.UserId = p.OwnerUserId
LEFT JOIN Posts ownr ON ownr.Id = p.ParentId
WHERE p.PostTypeId IN (1, 2) -- Question, answer
ORDER BY u.UserId DESC, p.CreationDate
(This query adds a convenient hyperlink to the user id.)
Note that just the top 10 users have more than 50K posts.