sqlsql-server-2017dataexplorer

Select SO reputation per post for a given user


I am currently working on an SQL query for SEDE which selects all the posts of an user from Stack Overflow and displays how much reputation each post generated.

What I can't wrap my head around is how to count all the upvotes, downvotes and accepts for every post and then to calculate the overall reputation gain of each post.

So I would to group by the Post id display the Total score and show how much reputation is gained overall.

The reputation each vote produces can be seen here:

+-----------+----------+--------+
| Post type | Question | Answer |
+-----------+----------+--------+
| Upvote    |    5     |   10   |
+-----------+----------+--------+
| Downvote  |    2     |    2   |
+-----------+----------+--------+
| Accept    |    5     |   15   |
+-----------+----------+--------+

The database schema I am targeting can be found here.

My query so far looks like this:

select 
  p.Id as 'Post id', 
  pt.Name as 'Post type',
  p.Score as 'Total score',
  (
    case vt.Id
      when 1 then 'Accept'
      when 2 then 'Upvote'
      else 'Downvote'
      end
  ) as 'Reputation type'
from 
  Posts p
join 
  Votes v
on
  v.PostId = p.Id
join 
  VoteTypes vt
on 
  vt.Id = v.VoteTypeId
join
  PostTypes pt
on 
  pt.Id = p.PostTypeId
where
  p.OwnerUserId = ##UserId##
and 
  vt.Id in (1, 2, 3)
order by
  p.Score,
  vt.Id
asc

And the output it produces looks something like this:

I tried to group by Vote type id:

group by  
  vt.id

so that I could at least find out how many different votes each post accumulated by using something along the lines of:

select
  ....
  count(vt.id)

but then I get an error that the Posts.Id column can't be resolved:

The current runnable but incomplete query can be found here (You have to input your user id for it to run)


Solution

  • Reputation per post is a bit more complicated than that.
    In addition to Up/Down/Accept votes, there are also:

    1. Bounties,
    2. Rep caps,
    3. The user's rep floor (no user can have less than 1 rep (usually)).
    4. Community Wiki.
    5. Spam or Abusive flags.
    6. Upvotes count for different amounts on questions, depending on the site.
    7. Many posts have no votes at all, so the SQL needs to reflect that. (Left join, COALESCE, etc.)
    8. Do you count an approved suggested edit on a post (2 pts rep)?
    9. Probably something else that I'm forgetting.


    Depending on what you are really after, you might consider using this API method instead (and then grouping the results by post_id).

    Anyway, here's your query tuned up a bit:

    SELECT      p.Id      AS [Post Link]
                , pt.Name AS 'Post type'
                , p.Score AS 'Total score'
                , COALESCE (vtStats.AcceptRep, 0)   AS [Accept Rep]
                , COALESCE (vtStats.numUpvotes, 0)  *  (
                    CASE  p.PostTypeId
                        WHEN  1  THEN  5  -- Questions on most sites
                        WHEN  2  THEN 10  -- Answers
                        ELSE  1  -- Should not happen, but don't zero out
                    END
                ) AS [Up Vt Rep]
                , COALESCE (vtStats.DwnVtRep, 0)    AS [Dwn Vt Rep]
    FROM        Posts p
    LEFT JOIN   (
        SELECT      v.PostId
                    , SUM (CASE v.VoteTypeId    WHEN 1  THEN 15  ELSE 0  END)   AS AcceptRep
                    , SUM (CASE v.VoteTypeId    WHEN 3  THEN -2  ELSE 0  END)   AS DwnVtRep
                    , SUM (CASE v.VoteTypeId    WHEN 2  THEN  1  ELSE 0  END)   AS numUpvotes -- Needs special handling
        FROM        Votes v
        WHERE       v.VoteTypeId IN (1, 2, 3 )
        GROUP BY    v.PostId
    )
    AS vtStats  ON  vtStats.PostId = p.Id
    INNER JOIN  Posttypes pt    ON pt.Id = p.PostTypeId
    WHERE       p.OwnerUserId = ##UserId:Int##
    ORDER  BY   p.Score DESC
                , [Accept Rep] DESC
                , pt.Name