machine-learningscikit-learnjupyter-notebookfeature-engineering

dead kernel when doing feature engineering?


I am working on a prediction problem. In my training set, I have around 8,700 samples and around 1,000 features. I used different models but still, it is highly biased. So, I decided to add new features to the model. I added some lags to the features and then used the polynomial tools in sklearn to generate polynomial features (degree=2).

from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(2)
X_poly = poly.fit_transform(X)
X = pd.DataFrame(X_poly, columns=poly.get_feature_names_out(), index=X.index)

Now, I have around 490,000 features. Next, when I want to do the feature scaling,

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X)

I face an error in jupyternotebook saying "dead kernel" and I cannot go further.

enter image description here

What should I do? Any suggestion?


Solution

  • You need to do a batch processing with partial fit and then transform (also needs a loop):

    scaler = StandardScaler()
    
    n = X.shape[0]  # rows
    batch_size = 1000  
    i = 0 
    
    while i < n:
        partial_size = min(batch_size, n - i)  
        partial_x = X[i:i + partial_size]
        scaler.partial_fit(partial_x)
        i += partial_size