I use a query in the Stack Exchange Data Explorer (SEDE).
This is my query:
SELECT A.Id
, A.PostTypeId
, A.Title
, A.Body
, A.ParentId
, A.Tags
, A.CreationDate
FROM posts A
LEFT JOIN users U
ON A.OwnerUserId = U.id
WHERE U.Id = ##UserId##
AND A.PostTypeId = 1
UNION
SELECT A.Id
, A.PostTypeId
, A.Title
, A.Body
, A.ParentId
, B.Tags
, A.CreationDate
FROM posts A
LEFT JOIN users U
ON A.OwnerUserId = U.id
RIGHT JOIN posts B
ON A.ParentId = B.Id
WHERE U.Id = ##UserId##
AND A.PostTypeId = 2
In the code above, posts in Stack Overflow have 2 types: question and answer. Questions(PostTypeId is 1 in database schema) have the tags, but the answers(PostTypeId is 2 in database schema) do not have the tags.
Answers belong to questions through the ParentId.
But the efficiency of the my query above is too low, I only can get some (using user id) posts' tags.
How can I get all users' posts' tags within the SEDE time out?
Several things:
Users table but ID, then don't include that table. It chews up cycles and Posts.OwnerUserId is the same thing.UNION statements if possible (it is in this case).UNION statements, use UNION ALL if possible (it is in this case). This saves the engine from having to do duplicate checks.So, here is the execution plan for the original query:
Here is a streamlined plan:
And the query that corresponds to it:
SELECT TOP 50000
p.OwnerUserId AS [User]
, p.Id AS [Post Id]
, CASE WHEN p.PostTypeId = 1 THEN 'Q' ELSE 'A' END AS [Type]
, COALESCE (p.Title, ownr.Title) AS [Title]
, p.Body
, COALESCE (p.Tags, ownr.Tags) AS [Tags]
, p.CreationDate
FROM Posts p
LEFT JOIN Posts ownr ON ownr.ID = p.ParentId
WHERE p.OwnerUserId = ##UserId##
AND p.PostTypeId IN (1, 2) -- Question, answer
ORDER BY p.OwnerUserId DESC, p.CreationDate
-- which also gives more readable results -- especially when the WHERE clause is removed.
But, if you can limit by, say, user before hand; you get an even more efficient query:
WITH usersOfInterest AS (
SELECT TOP 10
u.Id AS [UserId]
, u.DisplayName
FROM Users u
ORDER BY u.Reputation DESC
)
SELECT
[User] = 'site://u/' + CAST(u.UserId AS NVARCHAR) + '|' + CAST(u.UserId AS NVARCHAR)
, p.Id AS [Post Id]
, CASE WHEN p.PostTypeId = 1 THEN 'Q' ELSE 'A' END AS [Type]
, COALESCE (p.Title, ownr.Title) AS [Title]
, p.Body
, COALESCE (p.Tags, ownr.Tags) AS [Tags]
, p.CreationDate
FROM usersOfInterest u
INNER JOIN Posts p ON u.UserId = p.OwnerUserId
LEFT JOIN Posts ownr ON ownr.Id = p.ParentId
WHERE p.PostTypeId IN (1, 2) -- Question, answer
ORDER BY u.UserId DESC, p.CreationDate
(This query adds a convenient hyperlink to the user id.)
Note that just the top 10 users have more than 50K posts.