pythonpandasdataframefunctionkeyerror

python DataFrame KeyError when passed into function


I have a program to compile bills received from e-mail, everything was working fine till i decided to pass a DataFrame as requirement of a function now i receive a KeyError error.

Below is the "bill_data" DataFrame:

bill_data

Fatura Nota Valor vencimento
0 8375 385.00 16/10/2023
0 8376 362.10 16/10/2023
22578 8386 1197.89 16/10/2023
22578 8387 1505.38 16/10/2023
22582 8388 2049.82 16/10/2023
22582 8389 4138.58 16/10/2023
22579 8390 3546.34 16/10/2023
22579 8391 1389.32 16/10/2023

My function simply get each unique "Fatura", reads an xlsx archive and pass the bill_data "Fatura", "Nota" and "vencimento" information based on "Valor" that is a sum of a column from the xlsx archive.

Below is my function "compile_extract":

compile_extract

def compile_extract(bill: pd.DataFrame, folder: Path):    # ! erro com o dataframe
    ## Final product will look like this
    extract = pd.DataFrame(
        columns = [
            'REL', 'ID', 'ND', 'FAT', 'VencimentoFatura', 'StatusFatura',
            'StatusRelatorio', 'StatusSAP', 'Unidade de Negócio',
            'CNPJ Unidade de Negócio', 'Relatório', 'Descrição',
            'Motivo', 'Viajante', 'Serviço', 'Data de Emissão',
            'Destino', 'Fornecedor', 'Data Início da Viagem',
            'Data Fim da Viagem', 'Localizador',
            'Código Centro de Custo',
            'Descrição do Centro de Custo',
            'Rateio de Centro de Custo',
            'Valor', 'Código Projeto',
            'Nome Projeto'
        ]
    )
    
    ## Reutnrs only unique Fatura values
    bills = bill['Fatura'].unique()
    
    for i in range(len(bills)):

        ## Current Fatura number
        bill_num = bills[i]

        ## Ignores zero values in Fatura
        if bill_num != 0:
            ## Read the extract
            data = pd.read_excel(folder / str(str(bill_num) + ".xlsx"))
    
            ## Hotel services
            bill_hotel = data.loc[data['Serviço'] == str("Hotel")]
            bill_hotel_sum = round(bill_hotel['Valor'].sum(), 2)
    
            ## Other services
            bill_else = data.loc[data['Serviço']!=str("Hotel")]
            bill_else_sum = round(bill_else['Valor'].sum(), 2)
            
            ## Maturity date
            bill_maturity = bill.loc[bill['Fatura'] == int(bill_num)]['vencimento'].iloc[0]
    
            # Hotel services data
            ## Nota number
            invoice_hotel_num = bill.loc[(bill['Fatura'] == int(bill_num)) & (bill['Valor'] == bill_hotel_sum)]['Nota']
            if invoice_hotel_num.empty == False:
                invoice_hotel_num.values[0]
                ## Add extra columns
                bill_hotel.insert(0, "REL", None,True)
                bill_hotel.insert(1, "ID", None,True)
                bill_hotel.insert(2, "ND", int(invoice_hotel_num),True)
                bill_hotel.insert(3, "FAT", int(bill_num), True)
                bill_hotel.insert(4, "VencimentoFatura", bill_maturity, True)
                bill_hotel.insert(5, "StatusFatura", None, True)
                bill_hotel.insert(6, "StatusRelatorio", None, True)
                bill_hotel.insert(7, "StatusSAP", None, True)
    
            # Other services data
            ## Nota number
            invoice_else_num = bill.loc[(bill['Fatura'] == int(bill_num)) & (bill['Valor'] == bill_else_sum)]['Nota']
            if invoice_else_num.empty == False:
                invoice_else_num.values[0]
                ## Add extra columns
                bill_else.insert(0, "REL", None,True)
                bill_else.insert(1, "ID", None,True)
                bill_else.insert(2, "ND", int(invoice_else_num),True)
                bill_else.insert(3, "FAT", int(bill_num), True)
                bill_else.insert(4, "VencimentoFatura", bill_maturity, True)
                bill_else.insert(5, "StatusFatura", None, True)
                bill_else.insert(6, "StatusRelatorio", None, True)
                bill_else.insert(7, "StatusSAP", None, True)
            
            ## Concat both services
            bill = pd.concat([bill_hotel, bill_else]).sort_index()
            extract = pd.concat([extract, bill]).reset_index(drop = True)

    ## Removes NaN values
    extract = extract.where(pd.notnull(extract), None)
    return extract

The function now is used this way:

bill_comp = compile_extract(bill_data, download_folder)

Using this way i get a KeyError as below:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pandas\core\indexes\base.py:3653, in Index.get_loc(self, key)
   3652 try:
-> 3653     return self._engine.get_loc(casted_key)
   3654 except KeyError as err:

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pandas\_libs\index.pyx:147, in pandas._libs.index.IndexEngine.get_loc()

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pandas\_libs\index.pyx:176, in pandas._libs.index.IndexEngine.get_loc()

File pandas\_libs\hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas\_libs\hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Fatura'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
c:\Users\jec\OneDrive - XP Investimentos\Financeiro\X - Teste\Automações\600000\DEV.ipynb Célula 21 line 2
     21 # bill_file = donwload_extract(bill_data, download_folder)
     22 bill_file = [
     23     '22582.xlsx',
     24     '22578.xlsx',
     25     '22579.xlsx'
     26 ]
---> 28 bill_comp = compile_extract(bill_data, download_folder)

c:\Users\jec\OneDrive - XP Investimentos\Financeiro\X - Teste\Automações\600000\DEV.ipynb Célula 21 line 3
     33 bill_else_sum = round(bill_else['Valor'].sum(), 2)
     35 # Recupera a data de vencimento
---> 36 bill_maturity = bill.loc[bill['Fatura'] == int(bill_num)]['vencimento'].iloc[0]
     38 # Dados dos realatórios de hoteis
     39 # Retorna o número da nota (Array)
     40 invoice_hotel_num = bill.loc[(bill['Fatura'] == int(bill_num)) & (bill['Valor'] == bill_hotel_sum)]['Nota']

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pandas\core\frame.py:3761, in DataFrame.__getitem__(self, key)
   3759 if self.columns.nlevels > 1:
   3760     return self._getitem_multilevel(key)
-> 3761 indexer = self.columns.get_loc(key)
   3762 if is_integer(indexer):
   3763     indexer = [indexer]

File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pandas\core\indexes\base.py:3655, in Index.get_loc(self, key)
   3653     return self._engine.get_loc(casted_key)
   3654 except KeyError as err:
-> 3655     raise KeyError(key) from err
   3656 except TypeError:
   3657     # If we have a listlike key, _check_indexing_error will raise
   3658     #  InvalidIndexError. Otherwise we fall through and re-raise
   3659     #  the TypeError.
   3660     self._check_indexing_error(key)

KeyError: 'Fatura'

Before i was passing bill_data directly since it was a global variable, now i need to pass it thru the function. Done some tests and if i simply replace the variable from the function to bill_data directly the code works, what am i missing?


Solution

  • Towards the bottom of your code you're reassigning bill as bill = pd.concat([bill_hotel, bill_else]).sort_index() where both bill_hotel and bill_else might not contain the column 'Fatura'. I suspect the first iteration of the for loop will work and then as bill gets reassigned it will give you a KeyError. Check if the dataframe data contains the column Fatura or considering assigning pd.concat([bill_hotel, bill_else]).sort_index() to another dataframe