What are the most recent StackOverflow datasets on BigQuery? Two datasets I am aware of are very much out of date. Are there more recent ones?
bigquery-public-data.stackoverflow
referenced on https://console.cloud.google.com/marketplace/product/stack-exchange/stack-overflow?project=stackoverflowquery-407019 (and also in "Hello, world" google bigquery tutorial...) has not been updated since 24 Nov 2022 (see the screenshot).SELECT distinct quarter FROM
fh-bigquery.stackoverflow_archive_questions.merged
query, results are below).Row((datetime.date(2017, 6, 1),), {'quarter': 0})
Row((datetime.date(2017, 3, 1),), {'quarter': 0})
Row((datetime.date(2017, 12, 1),), {'quarter': 0})
Row((datetime.date(2018, 6, 1),), {'quarter': 0})
Row((datetime.date(2019, 3, 1),), {'quarter': 0})
Row((datetime.date(2018, 9, 1),), {'quarter': 0})
Row((datetime.date(2017, 9, 1),), {'quarter': 0})
Row((datetime.date(2018, 3, 1),), {'quarter': 0})
Row((datetime.date(2018, 12, 1),), {'quarter': 0})
According to the bigquery-public-data.stackoverflow.posts_questions" details, this table was last updated on Nov 25, 2022, 5:10:40 AM UTC+5:30.
This is a known issue, there is a request filed for the same. You can vote for this by clicking the +1
and STAR
mark to receive updates on it.
As a workaround, If you want to query directly using the BQ console you can consider storing stack overflow’s open-source dataset files in the Google Cloud storage and can be queried directly on the BQ console.
For more information, you can refer to these link1,link2 and link3.