python-3.xpandasgoogle-bigquerygoogle-cloud-datalab

Getting error Please verify that the structure and data types in the DataFrame match the schema of the destination table when appending


I am running a loop that will run some process create a dataframe and the dataframe is then added to big query table. But when i am appending on existing table I am getting a error. Please verify that the structure and data types in the DataFrame match the schema of the destination table. The values are coming from functions.

from  pandas.io import gbq
import pandas as pd
import numpy as np
import datetime as dt
from datalab.context import Context
import time

for id_name in ID_:
      df= ['id_recip','length_data','length_action', 'daily_mail_freq', 'weekly_mail_frequency', 'imp_hour', 'imp_day']
      columns = list(df)
      data=[]
      values = [id_name,length_data,length_action, daily_mail, weekly_mail, imp_hour, imp_day]
      zipped = zip(columns, values)
      a_dictionary = dict(zipped)
      print(a_dictionary)
      final_output=pd.DataFrame(a_dictionary)
      final_output = final_output.astype(str)
      final_output.info()
      final_output.to_gbq('internal.frequency_output3',
                            Context.default().project_id,
                            if_exists='append')

I converted all data in dataframe to string to avoid datatype mismatch. On first loop table is getting created if it dont exist.

Structure in bigquery table  
daily_mail_freq STRING NULLABLE 
id_recip STRING  NULLABLE 
imp_day STRING NULLABLE 
imp_hour  STRING NULLABLE
length_action  STRING NULLABLE 
length_data STRING  NULLABLE
weekly_mail_frequency STRING  NULLABLE

There are no dates in them


Solution

  • One way of doing this is using google.cloud bigquery. in this case it changes to a sql statement and push data rather than using dataframe.

    def export_items_to_bigquery(daily_mail_freq,id_recip,imp_day,imp_hour,length_action,length_data,weekly_mail_frequency ):
        # Instantiates a client
        client = bigquery.Client()
        bigquery_client = bigquery.Client()
    
        # Prepares a reference to the dataset
        dataset_ref = bigquery_client.dataset('dbn')
    
        table_ref = dataset_ref.table('fqo')
        table = bigquery_client.get_table(table_ref)  
    
        rows_to_insert = [
            (daily_mail_freq , id_recip, imp_day, imp_hour, length_action , length_data, weekly_mail_frequency)]
        errors = bigquery_client.insert_rows(table, rows_to_insert)  # API request
        assert errors == [] 
    

    now in loop just pass the data to the function