google-bigquerystackexchange-api

What are the most recent StackOverflow datasets on BigQuery?


What are the most recent StackOverflow datasets on BigQuery? Two datasets I am aware of are very much out of date. Are there more recent ones?

  1. bigquery-public-data.stackoverflow referenced on https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow?project=stackoverflowquery-407019 (and also in "Hello, world" google bigquery tutorial...) has not been updated since 24 Nov 2022 (see the screenshot).
  2. Datasets mentioned by Filipe Hoffe here have 2019 as their most recent data (SELECT distinct quarter FROM fh-bigquery.stackoverflow_archive_questions.merged query, results are below).

enter image description here

Row((datetime.date(2017, 6, 1),), {'quarter': 0})
Row((datetime.date(2017, 3, 1),), {'quarter': 0})
Row((datetime.date(2017, 12, 1),), {'quarter': 0})
Row((datetime.date(2018, 6, 1),), {'quarter': 0})
Row((datetime.date(2019, 3, 1),), {'quarter': 0})
Row((datetime.date(2018, 9, 1),), {'quarter': 0})
Row((datetime.date(2017, 9, 1),), {'quarter': 0})
Row((datetime.date(2018, 3, 1),), {'quarter': 0})
Row((datetime.date(2018, 12, 1),), {'quarter': 0})

Solution

  • According to the bigquery-public-data.stackoverflow.posts_questions" details, this table was last updated on Nov 25, 2022, 5:10:40 AM UTC+5:30.

    This is a known issue, there is a request filed for the same. You can vote for this by clicking the +1 and STAR mark to receive updates on it.

    As a workaround, If you want to query directly using the BQ console you can consider storing stack overflow’s open-source dataset files in the Google Cloud storage and can be queried directly on the BQ console.

    For more information, you can refer to these link1,link2 and link3.