pythongoogle-bigquerygoogle-cloud-platformgoogle-python-api

load large data from bigquery to python


from google.cloud import bigquery as bq
import google_auth_oauthlib.flow

query = '''select ... from ...'''

bigquery_client = bq.Client()
table = bq.query.QueryResults(query=query,client=bigquery_client)
table.use_legacy_sql = False
table.run()

# transfer bigquery data to pandas dataframe
columns=[field.name for field in table.schema]
rows = table.fetch_data()
data = []
for row in rows:
    data.append(row)

df = pd.DataFrame(data=data[0],columns=columns)

I want to load more than 10 million rows into python and it worked fine a few weeks ago, but now it only returns 100,000 rows. Anyone knows a reliable way to do this?


Solution

  • I just tested this code here and could bring 3 million rows with no caps being applied:

    import os
    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path/to/key.json'
    
    from google.cloud.bigquery import Client
    
    bc = Client()
    query = 'your query'
    
    job  = bc.run_sync_query(query)
    job.use_legacy_sql = False
    job.run()
    
    data = list(job.fetch_data())
    

    Does it work for you?