pythonpandasnumpykeyerror

MLPRegressor result


This is an example of a book that I can't figure out or make it work. Other cases I could solve but it became a challenge. When I run it, it shows me this message: Paste complete output:

Traceback (most recent call last):
  File "c:\BAULO\PYTHON\ESTRUCTURAS\P_ML\UNIDAD6\Book16.py", line 14, in <module>
    data_y = msu_df[w]
  File "C:\Users\Mauri\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3810, in __getitem__
    indexer = self.columns._get_indexer_strict(key, "columns")[1]
  File "C:\Users\Mauri\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 6111, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "C:\Users\Mauri\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\base.py", line 6171, in _raise_if_missing
    raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index([('N_Applications',)], dtype='object')] are in the [columns]"

The csv can be downloaded from: https://github.com/PacktPublishing/Hands-On-Data-Preprocessing-in-Python/blob/main/Chapter06/MSU%20applications.csv

I tried axis=1 and reshape, but I can't figure out the error. I know this topic has already been discussed but what I found doesn't work for me either.

import  pandas as pd
import numpy as np
from sklearn.neural_network import MLPRegressor

msu_df = pd.read_csv('MSU applications.csv')
msu_df.set_index('Year', drop=True, inplace=True)

X = ['P_Football_Performance','SMAn2']
y = 'N_Applications'

w = np.reshape(y, (1,-1))

data_X = msu_df[X]
data_y = msu_df[w]

mlp = MLPRegressor(hidden_layer_sizes=6, max_iter=100000)
print(mlp.predict(mlp.fit(data_X, data_y)))

Solution

  • You are doing an unnecessary step here:

    w = np.reshape(y, (1,-1))
    

    I'm not sure why you are doing it but that step is just converting y from a string to an array. You can directly pass y to get the label from the dataframe msu_df:

    y = 'N_Applications'
    data_y = msu_df[y]
    

    Additional: You are calling the predict() method on top of fit() method which is not the correct way to make a prediction. The fit() method is essentially the training stage and the data used in this stage is the training data which in this case your data_X and data_y. You would want to make predictions on unseen/new data, not on the ones where the model is already trained. You should replace this line:

    print(mlp.predict(mlp.fit(data_X, data_y)))
    

    with this:

    mlp.fit(data_X, data_y)
    

    Sample code for prediction from the tutorial notebook you are following:

    newData = pd.DataFrame({'P_Football_Performance':0.364,'SMAn2':17198},
                       index=[2022]) 
    mlp.predict(newData)