pandasdataframenumpyconditional-statementslines-of-code

KeyError setting values (binning) using conditional statements and loops to pandas columns


#Tried to bin the Cnum column using the Cabin and Cnum values. I tried using .apply() method but I needed to check two columns for the binning. I tried using the .iterrows() method but didn't get any satisfactory results. I've been trying these method for three straight hours so a helping hand would really be good.

for i in range(len(training["Forward"])):
    
    if training.loc[i,"B"] & training.loc[i,"Cnum"]>=63 & training[i,"Cnum"]<=100:
        training[i,"Forward"]=0
    else:
        training[i,"Forward"]=1
            

I get the following Error which didn't give me any useable info:

KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

/opt/conda/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/opt/conda/lib/python3.7/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'B'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_16/1974994097.py in <module>
      1 for i in range(len(training["Forward"])):
      2 
----> 3     if training.loc[i,"B"] & training.loc[i,"Cnum"]>=63 & training[i,"Cnum"]<=100:
      4         training[i,"Forward"]=0
      5     else:

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    923                 with suppress(KeyError, IndexError):
    924                     return self.obj._get_value(*key, takeable=self._takeable)
--> 925             return self._getitem_tuple(key)
    926         else:
    927             # we by definition only have the 0th axis

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
   1098     def _getitem_tuple(self, tup: tuple):
   1099         with suppress(IndexingError):
-> 1100             return self._getitem_lowerdim(tup)
   1101 
   1102         # no multi-index, so validate all of the indexers

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_lowerdim(self, tup)
    860                     return section
    861                 # This is an elided recursive call to iloc/loc
--> 862                 return getattr(section, self.name)[new_key]
    863 
    864         raise IndexingError("not applicable")

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    929 
    930             maybe_callable = com.apply_if_callable(key, self.obj)
--> 931             return self._getitem_axis(maybe_callable, axis=axis)
    932 
    933     def _is_scalar_access(self, key: tuple):

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1162         # fall thru to straight lookup
   1163         self._validate_key(key, axis)
-> 1164         return self._get_label(key, axis=axis)
   1165 
   1166     def _get_slice_axis(self, slice_obj: slice, axis: int):

/opt/conda/lib/python3.7/site-packages/pandas/core/indexing.py in _get_label(self, label, axis)
   1111     def _get_label(self, label, axis: int):
   1112         # GH#5667 this will fail if the label is not present in the axis.
-> 1113         return self.obj.xs(label, axis=axis)
   1114 
   1115     def _handle_lowerdim_multi_index_axis0(self, tup: tuple):

/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in xs(self, key, axis, level, drop_level)
   3774                 raise TypeError(f"Expected label or tuple of labels, got {key}") from e
   3775         else:
-> 3776             loc = index.get_loc(key)
   3777 
   3778             if isinstance(loc, np.ndarray):

/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

Solution

  • You can use .apply() with the whole row through a lambda function. I'd write a function that does what you want, pass the row to it, and then return the value you want. Something like this:

    def binme(row):
      if row.B and row.Cum>=63 and row.Cnum<=100:
        return 0
      return 1
    
    df["Forward"] = df.apply(lambda row: binme(row), axis = 1)