machine-learningscalingsamplingsmote

Python: should Data Scaling be done before Sampling in Machine Learning?


When should I do data scaling and Sampling (since my data is imbalanced)? Should I do data scaling first then Sampling?


Solution

  • You probably want to standardize/scale your independent values after sampling/splitting.

    If you're into the Python programming language, scikit-learn.org has a few examples that might address your issue a little better. Here's an example solution that deals with the importance of feature scaling.

    Here's another one that includes stratified sampling.