This is an example of a book that I can't figure out or make it work. Other cases I could solve but it became a challenge. When I run it, it shows me this message: Paste complete output:
Traceback (most recent call last):
File "c:\BAULO\PYTHON\ESTRUCTURAS\P_ML\UNIDAD6\Book16.py", line 14, in <module>
data_y = msu_df[w]
File "C:\Users\Mauri\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3810, in __getitem__
indexer = self.columns._get_indexer_strict(key, "columns")[1]
File "C:\Users\Mauri\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 6111, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "C:\Users\Mauri\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 6171, in _raise_if_missing
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index([('N_Applications',)], dtype='object')] are in the [columns]"
The csv can be downloaded from: https://github.com/PacktPublishing/Hands-On-Data-Preprocessing-in-Python/blob/main/Chapter06/MSU%20applications.csv
I tried axis=1
and reshape
, but I can't figure out the error. I know this topic has already been discussed but what I found doesn't work for me either.
import pandas as pd
import numpy as np
from sklearn.neural_network import MLPRegressor
msu_df = pd.read_csv('MSU applications.csv')
msu_df.set_index('Year', drop=True, inplace=True)
X = ['P_Football_Performance','SMAn2']
y = 'N_Applications'
w = np.reshape(y, (1,-1))
data_X = msu_df[X]
data_y = msu_df[w]
mlp = MLPRegressor(hidden_layer_sizes=6, max_iter=100000)
print(mlp.predict(mlp.fit(data_X, data_y)))
You are doing an unnecessary step here:
w = np.reshape(y, (1,-1))
I'm not sure why you are doing it but that step is just converting y
from a string
to an array
. You can directly pass y
to get the label
from the dataframe msu_df
:
y = 'N_Applications'
data_y = msu_df[y]
Additional: You are calling the predict()
method on top of fit()
method which is not the correct way to make a prediction. The fit()
method is essentially the training stage and the data used in this stage is the training data which in this case your data_X
and data_y
. You would want to make predictions on unseen/new data, not on the ones where the model is already trained. You should replace this line:
print(mlp.predict(mlp.fit(data_X, data_y)))
with this:
mlp.fit(data_X, data_y)
Sample code for prediction from the tutorial notebook you are following:
newData = pd.DataFrame({'P_Football_Performance':0.364,'SMAn2':17198},
index=[2022])
mlp.predict(newData)