pythonpandasnumpydataframecompiler-errors

Interpreting a “Traceback (most recent call last):” error


I realize this issue has been explained many times before so I understand if this is closed as a duplicate, but I have some more theoretic questions to ask that may justify this as a new question. I'm new to Python (and SO), so bear with me.

I'm trying to read in a .csv file that has 16 columns and 30,000ish rows, populated with values from 0 to 17. There are no empty cells. What I would like to do is iterate through each of the rows, doing entry-wise subtraction with the cells from each other row. Currently, I'm attempting to do this using a Pandas DataFrame. So my first question is: Should I be using a different data structure? I've read that DataFrame's are bad for iterating through rows.

Next, for the title question, I need help interpreting my error. Thusfar, I've only written code to try this subtraction on a small subset of the data. Here's my code:

import numpy as np
import pandas as pd
scrambles = pd.read_csv('scrambles.csv')
df = pd.DataFrame(scrambles)
#print(df)
columns = list(df)
for i in columns:
    print (df[i][0]-df[i][1])

This all works as anticipated. However, when I change the last piece of code to the following, I get an error:

for i in range(15):
    print (df[i][0]-df[i][1])

I'll post a transcript of the error below. The reason I'm trying to do it this way even though I have a working code is because when I write the full script, I'm iterating over a known amount of rows. For what it's worth, I'm doing this on Jupyter online.



KeyError                                  Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2889             try:
-> 2890                 return self._engine.get_loc(key)
   2891             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-6-0faa876fbe56> in <module>
      1 for i in range(15):
----> 2     print (df[i][0]-df[i][1])

/srv/conda/envs/notebook/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2973             if self.columns.nlevels > 1:
   2974                 return self._getitem_multilevel(key)
-> 2975             indexer = self.columns.get_loc(key)
   2976             if is_integer(indexer):
   2977                 indexer = [indexer]

/srv/conda/envs/notebook/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2890                 return self._engine.get_loc(key)
   2891             except KeyError:
-> 2892                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2893         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2894         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

Solution

  • I'll expand on my comment to answer the original question - interpreting the exception.

    The cause for the error is because your dataframe most likely is not using integers for its column names, so the integers 0 through 15 will cause the KeyError you're seeing, which is the final line of both Exceptions: KeyError: 0

    In the Traceback, Python is giving you additional context to the error that is happening.

    When you make the attempt to access column 0 of your dataframe, the processing code reaches line 2890 of base.py in the function get_loc().

    In that code, the KeyError that occurs is handled by the containing try/except. However, the handling call also raises a KeyError which is not handled (this call is also unfortunately not included in the Traceback). This is where the "During handling of the above exception, another exception occurred:" message comes in.

    To illustrate using the code itself:

                ...
                try:
                    return self._engine.get_loc(key) # <- KeyError raised here
                except KeyError:                     # <- Caught by except
                    return self._engine.get_loc(self._maybe_cast_indexer(key)) # <- 2nd KeyError
                ...
    

    Finally, as I said in the comment, the final line of the Traceback reveals the error:

    KeyError: 0