I am running a loop that will run some process create a dataframe and the dataframe is then added to big query table. But when i am appending on existing table I am getting a error. Please verify that the structure and data types in the DataFrame match the schema of the destination table. The values are coming from functions.
from pandas.io import gbq
import pandas as pd
import numpy as np
import datetime as dt
from datalab.context import Context
import time
for id_name in ID_:
df= ['id_recip','length_data','length_action', 'daily_mail_freq', 'weekly_mail_frequency', 'imp_hour', 'imp_day']
columns = list(df)
data=[]
values = [id_name,length_data,length_action, daily_mail, weekly_mail, imp_hour, imp_day]
zipped = zip(columns, values)
a_dictionary = dict(zipped)
print(a_dictionary)
final_output=pd.DataFrame(a_dictionary)
final_output = final_output.astype(str)
final_output.info()
final_output.to_gbq('internal.frequency_output3',
Context.default().project_id,
if_exists='append')
I converted all data in dataframe to string to avoid datatype mismatch. On first loop table is getting created if it dont exist.
Structure in bigquery table
daily_mail_freq STRING NULLABLE
id_recip STRING NULLABLE
imp_day STRING NULLABLE
imp_hour STRING NULLABLE
length_action STRING NULLABLE
length_data STRING NULLABLE
weekly_mail_frequency STRING NULLABLE
There are no dates in them
One way of doing this is using google.cloud bigquery. in this case it changes to a sql statement and push data rather than using dataframe.
def export_items_to_bigquery(daily_mail_freq,id_recip,imp_day,imp_hour,length_action,length_data,weekly_mail_frequency ):
# Instantiates a client
client = bigquery.Client()
bigquery_client = bigquery.Client()
# Prepares a reference to the dataset
dataset_ref = bigquery_client.dataset('dbn')
table_ref = dataset_ref.table('fqo')
table = bigquery_client.get_table(table_ref)
rows_to_insert = [
(daily_mail_freq , id_recip, imp_day, imp_hour, length_action , length_data, weekly_mail_frequency)]
errors = bigquery_client.insert_rows(table, rows_to_insert) # API request
assert errors == []
now in loop just pass the data to the function