I have a program to compile bills received from e-mail, everything was working fine till i decided to pass a DataFrame as requirement of a function now i receive a KeyError error.
Below is the "bill_data" DataFrame:
bill_data
Fatura | Nota | Valor | vencimento |
---|---|---|---|
0 | 8375 | 385.00 | 16/10/2023 |
0 | 8376 | 362.10 | 16/10/2023 |
22578 | 8386 | 1197.89 | 16/10/2023 |
22578 | 8387 | 1505.38 | 16/10/2023 |
22582 | 8388 | 2049.82 | 16/10/2023 |
22582 | 8389 | 4138.58 | 16/10/2023 |
22579 | 8390 | 3546.34 | 16/10/2023 |
22579 | 8391 | 1389.32 | 16/10/2023 |
My function simply get each unique "Fatura", reads an xlsx archive and pass the bill_data "Fatura", "Nota" and "vencimento" information based on "Valor" that is a sum of a column from the xlsx archive.
Below is my function "compile_extract":
compile_extract
def compile_extract(bill: pd.DataFrame, folder: Path): # ! erro com o dataframe
## Final product will look like this
extract = pd.DataFrame(
columns = [
'REL', 'ID', 'ND', 'FAT', 'VencimentoFatura', 'StatusFatura',
'StatusRelatorio', 'StatusSAP', 'Unidade de Negócio',
'CNPJ Unidade de Negócio', 'Relatório', 'Descrição',
'Motivo', 'Viajante', 'Serviço', 'Data de Emissão',
'Destino', 'Fornecedor', 'Data Início da Viagem',
'Data Fim da Viagem', 'Localizador',
'Código Centro de Custo',
'Descrição do Centro de Custo',
'Rateio de Centro de Custo',
'Valor', 'Código Projeto',
'Nome Projeto'
]
)
## Reutnrs only unique Fatura values
bills = bill['Fatura'].unique()
for i in range(len(bills)):
## Current Fatura number
bill_num = bills[i]
## Ignores zero values in Fatura
if bill_num != 0:
## Read the extract
data = pd.read_excel(folder / str(str(bill_num) + ".xlsx"))
## Hotel services
bill_hotel = data.loc[data['Serviço'] == str("Hotel")]
bill_hotel_sum = round(bill_hotel['Valor'].sum(), 2)
## Other services
bill_else = data.loc[data['Serviço']!=str("Hotel")]
bill_else_sum = round(bill_else['Valor'].sum(), 2)
## Maturity date
bill_maturity = bill.loc[bill['Fatura'] == int(bill_num)]['vencimento'].iloc[0]
# Hotel services data
## Nota number
invoice_hotel_num = bill.loc[(bill['Fatura'] == int(bill_num)) & (bill['Valor'] == bill_hotel_sum)]['Nota']
if invoice_hotel_num.empty == False:
invoice_hotel_num.values[0]
## Add extra columns
bill_hotel.insert(0, "REL", None,True)
bill_hotel.insert(1, "ID", None,True)
bill_hotel.insert(2, "ND", int(invoice_hotel_num),True)
bill_hotel.insert(3, "FAT", int(bill_num), True)
bill_hotel.insert(4, "VencimentoFatura", bill_maturity, True)
bill_hotel.insert(5, "StatusFatura", None, True)
bill_hotel.insert(6, "StatusRelatorio", None, True)
bill_hotel.insert(7, "StatusSAP", None, True)
# Other services data
## Nota number
invoice_else_num = bill.loc[(bill['Fatura'] == int(bill_num)) & (bill['Valor'] == bill_else_sum)]['Nota']
if invoice_else_num.empty == False:
invoice_else_num.values[0]
## Add extra columns
bill_else.insert(0, "REL", None,True)
bill_else.insert(1, "ID", None,True)
bill_else.insert(2, "ND", int(invoice_else_num),True)
bill_else.insert(3, "FAT", int(bill_num), True)
bill_else.insert(4, "VencimentoFatura", bill_maturity, True)
bill_else.insert(5, "StatusFatura", None, True)
bill_else.insert(6, "StatusRelatorio", None, True)
bill_else.insert(7, "StatusSAP", None, True)
## Concat both services
bill = pd.concat([bill_hotel, bill_else]).sort_index()
extract = pd.concat([extract, bill]).reset_index(drop = True)
## Removes NaN values
extract = extract.where(pd.notnull(extract), None)
return extract
The function now is used this way:
bill_comp = compile_extract(bill_data, download_folder)
Using this way i get a KeyError as below:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pandas\core\indexes\base.py:3653, in Index.get_loc(self, key)
3652 try:
-> 3653 return self._engine.get_loc(casted_key)
3654 except KeyError as err:
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pandas\_libs\index.pyx:147, in pandas._libs.index.IndexEngine.get_loc()
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pandas\_libs\index.pyx:176, in pandas._libs.index.IndexEngine.get_loc()
File pandas\_libs\hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas\_libs\hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Fatura'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
c:\Users\jec\OneDrive - XP Investimentos\Financeiro\X - Teste\Automações\600000\DEV.ipynb Célula 21 line 2
21 # bill_file = donwload_extract(bill_data, download_folder)
22 bill_file = [
23 '22582.xlsx',
24 '22578.xlsx',
25 '22579.xlsx'
26 ]
---> 28 bill_comp = compile_extract(bill_data, download_folder)
c:\Users\jec\OneDrive - XP Investimentos\Financeiro\X - Teste\Automações\600000\DEV.ipynb Célula 21 line 3
33 bill_else_sum = round(bill_else['Valor'].sum(), 2)
35 # Recupera a data de vencimento
---> 36 bill_maturity = bill.loc[bill['Fatura'] == int(bill_num)]['vencimento'].iloc[0]
38 # Dados dos realatórios de hoteis
39 # Retorna o número da nota (Array)
40 invoice_hotel_num = bill.loc[(bill['Fatura'] == int(bill_num)) & (bill['Valor'] == bill_hotel_sum)]['Nota']
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pandas\core\frame.py:3761, in DataFrame.__getitem__(self, key)
3759 if self.columns.nlevels > 1:
3760 return self._getitem_multilevel(key)
-> 3761 indexer = self.columns.get_loc(key)
3762 if is_integer(indexer):
3763 indexer = [indexer]
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pandas\core\indexes\base.py:3655, in Index.get_loc(self, key)
3653 return self._engine.get_loc(casted_key)
3654 except KeyError as err:
-> 3655 raise KeyError(key) from err
3656 except TypeError:
3657 # If we have a listlike key, _check_indexing_error will raise
3658 # InvalidIndexError. Otherwise we fall through and re-raise
3659 # the TypeError.
3660 self._check_indexing_error(key)
KeyError: 'Fatura'
Before i was passing bill_data directly since it was a global variable, now i need to pass it thru the function. Done some tests and if i simply replace the variable from the function to bill_data directly the code works, what am i missing?
Towards the bottom of your code you're reassigning bill as bill = pd.concat([bill_hotel, bill_else]).sort_index()
where both bill_hotel and bill_else might not contain the column 'Fatura'.
I suspect the first iteration of the for loop will work and then as bill gets reassigned it will give you a KeyError.
Check if the dataframe data
contains the column Fatura or considering assigning pd.concat([bill_hotel, bill_else]).sort_index()
to another dataframe