pythonsqlpandasloopsconcatenation

Concatenate dataframes in a loop where some of them may have no values


I am using a code for a query, sometimes the input goes and there is no return (basically it does not find anything so the return is an empty row) so it is empty. However, when I use pd.concat, those empty rows disappear. Is there a way to keep these no return rows in the loop as well so that when I use that I can have empty rows on the final output.csv?

import numpy as np
import pandas as pd

from dl import authClient as ac, queryClient as qc 
from dl.helpers.utils import convert

import openpyxl as xl

wb = xl.load_workbook('/Users/somethingfile.xlsx')
sheet = wb['Sheet 1']


df = pd.DataFrame([],columns = ['col1','col2',...,'coln'])

for row in range(3, sheet.max_row + 1): 
    a0, b0, r = sheet.cell(row,1).value, sheet.cell(row,2).value, 0.001
    query = """
    SELECT a,b,c,d,e FROM smthng
      WHERE q3c_radial_query(a,b,{:f},{:f},{:f}) LIMIT 1
    """.format(a0,b0,r)
    response = qc.query(sql=query,format='csv')
    temp_df = convert(response,'pandas')
    df = pd.concat([df,temp_df])

df.to_csv('output.csv')

Solution

  • For your specific question, it works if you check if temp_df is empty or not in each step and make it a DataFrame of NaNs if it is empty.

    Another note on the implementation is that, concatenating in each iteration will be a very expensive operation. It is much faster if you store the temp_dfs in each iteration in a list and concatenate once after the loop is over.

    lst = []  # <-- empty list to fill later
    for row in range(3, sheet.max_row + 1): 
        a0, b0, r = sheet.cell(row,1).value, sheet.cell(row,2).value, 0.001
        query = """
        SELECT a,b,c,d,e FROM smthng
          WHERE q3c_radial_query(a,b,{:f},{:f},{:f}) LIMIT 1
        """.format(a0,b0,r)
        response = qc.query(sql=query,format='csv')
        temp_df = convert(response,'pandas')
    
        if temp_df.empty:
            temp_df = pd.DataFrame([np.nan])
        lst.append(temp_df)
        
    df = pd.concat(lst)   # <-- concat once
    df.to_csv('output.csv', index=False)