I am doing auto encoder model.I have saved the model before which I scaled the data using min max scaler.
X_train = df.values
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
After doing this I fitted the model and saved it as 'h5' file.Now when I give test data, after loading the saved model naturally it should be scaled as well.
So when I load the model and scale it by using
X_test_scaled = scaler.transform(X_test)
It gives the error
NotFittedError: This MinMaxScaler instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
So I gave X_test_scaled = scaler.fit_transform(X_test)
(Which I had a hunch that it is foolish)did gave a result(after loading saved model and test) which was different when I trained it and test it together. I have saved around 4000 models now for my purpose(So I cant train and save it all again as it costs a lot time,So I want a way out).
Is there a way I can scale the test data by transforming it the way I trained it(may be saving the scaled values, I do not know).Or may be descale the model so that I can test the model on non-scaled data.
If I under-emphasized or over-emphasized any point ,please let me know in the comments!
X_test_scaled = scaler.fit_transform(X_test)
will scale X_test
given the minimum and maximum values of features in X_test
and not X_train
.
The reason your original code did not work is because
you probably did not save scaler
after fitting it to X_train
or overwrote it somehow (for e.g., by re-initializing it). This is why the error was thrown as scaler
was not fitted to any data.
When you then call X_test_scaled = scaler.fit_transform(X_test)
, you are fitting scaler
to X_test
and simultaneously tranforming X_test
, which was why the code was able to run, but this step is incorrect as you already surmised.
What you want is
X_train = df.values
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
# Save scaler
import pickle as pkl
with open("scaler.pkl", "wb") as outfile:
pkl.dump(scaler, outfile)
# Some other code for training your autoencoder
# ...
Then in your test script
# During test time
# Load scaler that was fitted on training data
with open("scaler.pkl", "rb") as infile:
scaler = pkl.load(infile)
X_test_scaled = scaler.transform(X_test) # Note: not fit_transform.
Note you don't have to re-fit the scaler
object after loading it back from disk. It contains all the information (the scaling factors etc.) obtained from the training data. You just call it on X_test
.