[SOLVED] How to provide cost for balancing training by imbalanced train dataset as available in svmlight?

How to provide cost for balancing training by imbalanced train dataset as available in svmlight?

Cost in e1071's SVM doesn't seems same as svmlight's Cost. The manual of e1071 library states the following definition for its cost parameter:

cost of constraints violation (default: 1)—it is the ‘C’-constant of the regular-
ization term in the Lagrange formulation

This is basically the allowance of miss-classification. There is one weight as provided by svmlight, described in its manual as:

Cost: cost-factor, by which training errors on
      positive examples outweight errors on negative
      examples (default 1)

This cost is basically to allow balancing in case the train data doesn't has equal number of positive and negative data points. Is there anything similar in e1071's SVM implementation?

Solution

You probably want to look at the argument: class.weights (which is explained on the help page).

Best David