sqlperformancebadgedataexplorer

Get "rank" of badge on SO for my user - query is slow - speedup possible?


I was curious the how many people got the python gold badge before me - I am able to get this information

python    2019-01-02 09:09:15   Gold    454

with this (slow running) query:

(I was unable to single/cross login with my main user on data explorer, hence the anonymous login)

-- insert your user id here:
declare @uid int = 7505395

-- get all badges of all users
select Name, Date, [Gold/Silver/Else], [Row#] from ( 
  SELECT Name, 
         Date, 
         userId,
         case when class = 1 then 'Gold'
              when class = 2 then 'Silver'
              when class = 3 then 'Bronze'
              else convert(varchar(10), class)
              end as 'Gold/Silver/Else',
              ROW_NUMBER() OVER(PARTITION BY name, class ORDER BY date ASC) AS Row# 
  FROM badges
  WHERE 1 = 1
    -- you can restrict this further, f.e. for looking only by gold badges
    -- and Class = 1  -- gold == 1, silver == 2, bronze == 3
    -- -- or for certain named badges
    -- and name like 'python%' 
) as tmp
where userID = @uid 
ORDER by name asc, Date asc

(The query as is gives me all my badges with how many got it before me and has to sort through all possible badges)

Question:

I tried to CTE (only errors, did not work) and my sql skills are rusty - how to speedup this query?


Solution

  • The problem is that the Table does not seem to have indices that are useful for this. We get execution plans like:

    bad plan

    -- Index scans are suboptimal. We want Index seeks.

    Still, you can cut the time almost in half by:

    1. Preselecting the user's badges.
    2. Using a correlated subquery for the rank.
    3. Using Id as a proxy for Date. (Ids are unique, increasing, and often faster to sort.)

    Also note:

    1. The use of the magic ##UserId:INT## parameter.
    2. There are only 3 values for the Class column.
    3. You can cut the query time down by another few seconds by omitting the ORDER BY clause.

    Anyway, this query performs better:

    WITH zUsersBadges AS (
        SELECT  b.Id
                , b.UserId
                , b.Name
                , b.Date
                , b.Class
                , [Badge Class] = (
                    CASE    WHEN b.Class = 1 THEN 'Gold'
                            WHEN b.Class = 2 THEN 'Silver'
                            WHEN b.Class = 3 THEN 'Bronze'
                    END
                )
                , [Is tag badge] = IIF (b.TagBased = 1, 'Yes', 'No')
        FROM    Badges b
        WHERE   b.UserId = ##UserId:INT##
    )
    SELECT      ub.Name                 AS [Badge Name]
                , ub.[Badge Class]
                , ub.[Is tag badge]
                , ub.Date               AS [Date Earned]
                , [In Top N of earners] = (
                    SELECT  COUNT (ob.ID)
                    FROM    Badges ob
                    WHERE   (ob.Name = ub.Name  AND  ob.Class = ub.Class  AND  ob.Id <= Ub.Id)  -- Faster but may give slightly higher rank
                    --WHERE   (ob.Name = ub.Name  AND  ob.Class = ub.Class  AND  ob.Date <= Ub.Date)  -- Slower, but gives exact rank.
                )
    FROM        zUsersBadges ub
    ORDER BY    ub.Name, ub.Date
    

    Update: This query performs even better because it aggregates multiply-earned badges:

    WITH zUsersBadges AS (
        SELECT      b.UserId
                    , b.Name
                    , minId = MIN (b.Id)
                    , [First Earned] = MIN (b.Date)
                    , [Earned N times] = COUNT (b.Date)
                    , b.Class
                    , [Badge Class] = (
                        CASE    WHEN b.Class = 1 THEN 'Gold'
                                WHEN b.Class = 2 THEN 'Silver'
                                WHEN b.Class = 3 THEN 'Bronze'
                        END
                    )
                    , [Is tag badge] = IIF (b.TagBased = 1, 'Yes', 'No')
        FROM        Badges b
        WHERE       b.UserId = ##UserId:INT##
        GROUP BY    b.UserId, b.Class, b.Name, b.TagBased
    )
    SELECT      ub.Name                 AS [Badge Name]
                , ub.[Badge Class]
                , ub.[Is tag badge]
                , ub.[First Earned]
                , ub.[Earned N times]
                , [In Top N of earners] = (
                    SELECT  COUNT (ob.ID)
                    FROM    Badges ob
                    WHERE   (ob.Class = ub.Class  AND  ob.Id <= Ub.minId  AND  ob.Name = ub.Name)  -- Faster but may give slightly higher rank
                    --WHERE   (ob.Class = ub.Class  AND  ob.Date <= Ub.[First Earned]  AND  ob.Name = ub.Name)  -- Faster but may give slightly higher rank
                )
    FROM        zUsersBadges ub
    ORDER BY    ub.Name, ub.[First Earned]