pandasdataframefor-looppython-camelotpandas-to-sql

How do I make sure not only the last element of a for loop dataframe is posted via to_sql to the database?


I try to post the dataframe to pgadmin. The dataframe was edited in a for loop, but when I post the data it posts only the last element in the for loop. I tried to use the to_sql inside the for loop and outside.

tables = camelot.read_pdf(pdf_path, pages = 'all', flavor = 'stream')
    for table in tables[:-1]:
         df = table.df
         df3 = pd.DataFrame()
         df3['nr'] = df2[0].values.astype(str).flatten().tolist()
         df3.to_sql('new_pilsCorner4', engine,if_exists='replace')
         engine = create_engine(f'postgresxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')

I tried to append the dataframe to an array and get a list, which I push into another dataframe outside the loop but that didn't work because of a missmatch between the columns.


Solution

  • You use replace which means you are ovrwritting the content in each run of your loop.

    Do either this:

    tables = camelot.read_pdf(pdf_path, pages = 'all', flavor = 'stream')
        for table in tables[:-1]:
             df = table.df
             df3 = pd.DataFrame()
             df3['nr'] = df2[0].values.astype(str).flatten().tolist()
             df3.to_sql('new_pilsCorner4', engine,if_exists='append')
             engine = create_engine(f'postgresxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
    

    or

    tables = camelot.read_pdf(pdf_path, pages='all', flavor='stream')
    engine = create_engine('postgresxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
    for table in tables[:-1]:
        df = table.df
        df3 = pd.DataFrame()
        df3['nr'] = df2[0].values.astype(str).flatten().tolist()
        df3.to_sql('new_pilsCorner4', engine, if_exists='append', index=False)