I'm trying to read in a CSV file into a pandas dataframe and select a column, but keep getting a key error.
The file reads in successfully and I can view the dataframe in an iPython notebook, but when I want to select a column any other than the first one, it throws a key error.
I am using this code:
import pandas as pd
transactions = pd.read_csv('transactions.csv',low_memory=False, delimiter=',', header=0, encoding='ascii')
transactions['quarter']
KeyError Traceback (most recent call last)
Cell In[10], line 2
1 transactions = pd.read_csv('transactions.csv',low_memory=False, delimiter=',', header=0, encoding='ascii')
----> 2 transactions['quarter']
File ~\anaconda3\envs\py312\Lib\site-packages\pandas\core\frame.py:3896, in DataFrame.__getitem__(self, key)
3894 if self.columns.nlevels > 1:
3895 return self._getitem_multilevel(key)
-> 3896 indexer = self.columns.get_loc(key)
3897 if is_integer(indexer):
3898 indexer = [indexer]
File ~\anaconda3\envs\py312\Lib\site-packages\pandas\core\indexes\base.py:3797, in Index.get_loc(self, key)
3792 if isinstance(casted_key, slice) or (
3793 isinstance(casted_key, abc.Iterable)
3794 and any(isinstance(x, slice) for x in casted_key)
3795 ):
3796 raise InvalidIndexError(key)
-> 3797 raise KeyError(key) from err
3798 except TypeError:
3799 # If we have a listlike key, _check_indexing_error will raise
3800 # InvalidIndexError. Otherwise we fall through and re-raise
3801 # the TypeError.
3802 self._check_indexing_error(key)
KeyError: 'quarter'
transactions.csv
Full data file
product_id, customer_id, store_id, promotion_id, month_of_year, quarter, the_year, store_sales, store_cost, unit_sales, fact_count
1,157,24,1869,12,'Q4',1997,'8.5500','2.9925','3.0000',1
1,456,15,0,6,'Q2',1997,'11.4000','4.3320','4.0000',1
1,638,11,0,9,'Q3',1997,'8.5500','2.9925','3.0000',1
1,916,7,0,4,'Q2',1997,'11.4000','4.9020','4.0000',1
1,923,15,0,7,'Q3',1997,'8.5500','2.7360','3.0000',1
1,1312,3,0,5,'Q2',1997,'8.5500','3.6765','3.0000',1
1,1565,24,0,9,'Q3',1997,'8.5500','4.1895','3.0000',1
1,2270,11,0,11,'Q4',1997,'8.5500','4.0185','3.0000',1
1,3065,3,0,11,'Q4',1997,'5.7000','2.5080','2.0000',1
1,3441,3,0,8,'Q3',1997,'8.5500','3.4200','3.0000',1
1,3528,17,0,10,'Q4',1997,'8.5500','3.8475','3.0000',1
1,4461,11,0,4,'Q2',1997,'8.5500','2.9925','3.0000',1
Use sep=r'\s*,\s*'
to parse a file where the columns may have some number of spaces preceding or following the delimiter (e.g. ,
):
transactions = pd.read_csv('transactions.csv', sep=r'\s*,\s*',
header=0, encoding='ascii', engine='python')
Prove:
print(transactions.columns)
Output:
Index(['product_id', 'customer_id', 'store_id', 'promotion_id', 'month_of_year', 'quarter', 'the_year', 'store_sales', 'store_cost', 'unit_sales', 'fact_count'], dtype='object')
Alternatively, remove unquoted spaces in the CSV file, and use your command (unchanged).