I'm using Datalab on Google Cloud Platform and was trying to create a BigQuery dataset with google.datalab.bigquery when I found I needed the .Client
method that was only in google.cloud.bigquery library.
What's the difference between the datalab and cloud versions of the bigquery library?
Is the datalab one a slimmed down version of the cloud library, or do they have different intended uses?
google.cloud.bigquery
is the Python client library for BigQuery. It provides access to all the functionality of the BigQuery rest API and is similar to the client library for Java, Go, C++ and other languages. It is essentially the idiomatic Python wrapper for things you can do with the bq service.
google.datalab.bigquery
is a Python library that is meant for use within notebooks by data scientists. For example, it has a method to take a BigQuery result set and convert it into a pandas data frame. Also, mltoolbox to simplify training and evaluation of machine learning models. There is no Java or Go equivalent. It uses the client library to actually talk to BigQuery.
Update (July 2019): google.cloud.bigquery has now been updated to include many of the nice things the datalab package used to provide, including Pandas interoperability. At this point, google.cloud.bigquery should be considered the preferred way to do things, even in notebooks. For example, the %%bigquery
magic comes as part of google.cloud.bigquery. Instead of using mltoolbox in Datalab, use BigQuery ML to train ML models directly in BigQuery.