sqlstackexchange

Top [TAG] Answerers in the Last 30 days for a given location


How would you put together a SQL query for data.stackexchange that will display the most active users (in terms of answers given) for a tag in a given location?

EG. something similar to the top 30 listed here https://stackoverflow.com/tags/ruby-on-rails-3/topusers but location specific.

So top Ruby Answerers in the last 30 days in Berlin etc

Thanks!


Solution

  • So, after looking at the database schema this is the query I came up with.

    -- Top 10 Ruby Answerers in the last 30 days in Berlin based on score
    select top 10
      u.displayname, 
      number_of_answers = count(*), 
      total_score = sum(p.score)
    from
      users u                                 
    join 
      posts p on p.owneruserid = u.id         -- joined to get answer posts
    join 
      posts pp on p.parentid = pp.id          -- post parent is the question
    join 
      posttags pt on pt.postid = pp.id        -- tags for post parent
    join 
      tags t on t.id = pt.tagid               -- tags for tag name
    where 
      t.tagname like '%ruby%'                 -- tags to filter for
     and                                      -- includes everything ruby inc. rails
      p.creationdate > (getdate()-30)         -- past 30 days
     and 
      u.location like '%Berlin%'              -- locations differ in format
    group by 
      u.displayname
    order by 
      3 desc;                                 -- order by total score for "best" answerers
                                              -- order by 2 (count) to get most active
    

    I'm not an expert in the data explorer schema so it's possible that the query isn't quite correct and there are some caveats: the date filter applies to the question and not the answer so it's possible that there are users with more answers in the last 30 days overall if they have answered older questions, and also, the location is a really unreliable field as many users haven't specified location at all. It's probably as close as it's possible to get though.

    The data explorer isn't that hard to use - experiment a bit with it and you'll realize how the tables are connected. It's a good exercise :)

    Here's the query