I have written some code for a linear regression model to predict house prices. I'm witting exactly the same as a tutorial video; when I write random_state=42
it works without any error, but when I change the random_state
to any other number it give this error.
Here is the code:
from sklearn.model_selection import train_test_split
X = data.drop('SalesPrice', axis = 1)
y = data['SalesPrice']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train, y_train)
predictions = lr.predict(X_test)
print("Actual value of the house: ", y_test[0])
print("Model prediction value: ", predictions[0])
and this is the error:
KeyError Traceback (most recent call last)
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:3653, in Index.get_loc(self, key)
3652 try:
-> 3653 return self._engine.get_loc(casted_key)
3654 except KeyError as err:
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\_libs\index.pyx:147, in pandas._libs.index.IndexEngine.get_loc()
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\_libs\index.pyx:176, in pandas._libs.index.IndexEngine.get_loc()
File pandas\_libs\hashtable_class_helper.pxi:2606, in pandas._libs.hashtable.Int64HashTable.get_item()
File pandas\_libs\hashtable_class_helper.pxi:2630, in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 0
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[66], line 3
1 predictions = lr.predict(X_test)
----> 3 print("Actual value of the house: ", y_test[0])
4 print("Model prediction value: ", predictions[0])
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\series.py:1007, in Series.__getitem__(self, key)
1004 return self._values[key]
1006 elif key_is_scalar:
-> 1007 return self._get_value(key)
1009 if is_hashable(key):
1010 # Otherwise index.get_value will raise InvalidIndexError
1011 try:
1012 # For labels that don't resolve as scalars like tuples and frozensets
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\series.py:1116, in Series._get_value(self, label, takeable)
1113 return self._values[label]
1115 # Similar to Index.get_value, but we do not fall back to positional
-> 1116 loc = self.index.get_loc(label)
1118 if is_integer(loc):
1119 return self._values[loc]
File C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\indexes\base.py:3655, in Index.get_loc(self, key)
3653 return self._engine.get_loc(casted_key)
3654 except KeyError as err:
-> 3655 raise KeyError(key) from err
3656 except TypeError:
3657 # If we have a listlike key, _check_indexing_error will raise
3658 # InvalidIndexError. Otherwise we fall through and re-raise
3659 # the TypeError.
3660 self._check_indexing_error(key)
KeyError: 0
As the traceback mentions, the error originates from print("Actual value of the house: ", y_test[0])
.
y_test[0]
will only work when randomly 20% of the data also has the 0
th index in it after train_test_split
. That's why it works for some values of random_state
and not for most.
Generally you want to use either:
y_test.to_list()[0]
y_test.iloc[0]
TLDR: Replace y_test[0]
in your print stament