machine-learningsvmlibsvm

Is it required to shuffle the training data for SVM multi-classification?


Actually I am using OpenCV's svm python interface and I am trying to classify data into 4 categories. When the labels and training data are in order, I mean for example the data were in 4 groups ordered as label 1, label 2, label 3 and label 4, the correct ratio was low, about only 50% right. But when I shuffled the training data, the result was reasonable, about 90% correct. So my question is: does the training data order affect the final result, or do I need to shuffle the data before training?


Solution

  • No it does not change the SVM training, although some parameters tuning methods used in your code can depend on the ordering. For example - if you use the cross validation without randomization, than ordered set is much harder (ach consequitive folds can have even 0 samples of some classes!).

    In short: