pythonmachine-learningdimensionality-reduction

How to choose n_component in TruncatedSVD


I want to use TruncatedSvd to reduce the dimension of dataset, but I can't understand how to choose the best number for the n_component. Can anyone help me?


Solution

  • This is a hyperparameter of your model, as such, there is no right answer. What you will likely want to do is split your dataset into training/validation/test set, and use the validation set to conduct hyperparameter tuning to conduct a grid-search of the number of components in the TruncatedSvd.

    The basic pipeline is:

    1. Train your model using only your training set, starting with some random value for the number of components

    2. Evaluate your model performance on the validation set. Then, go back to step one trying a different number of components. Then, evaluate again on your validation set. Repeat until you have searched over a reasonable size of number of components, and choose the number of components that gives you highest performance on the validation set.

    3. Evaluate your model on the test set. This is your final model performance