feature-selectionone-class-classification

Feature selection for one class classification


I try to apply One Class SVM but my dataset contains too many features and I believe feature selection would improve my metrics. Are there any methods for feature selection that do not need the label of the class? If yes and you are aware of an existing implementation please let me know


Solution

  • You'd probably get better answers asking this on Cross Validated instead of Stack Exchange, although since you ask for implementations I will answer your question.

    Unsupervised methods exist that allow you to eliminate features without looking at the target variable. This is called unsupervised data (dimensionality) reduction. They work by looking for features that convey similar information and then either eliminate some of those features or reduce them to fewer features whilst retaining as much information as possible.

    Some examples of data reduction techniques include PCA, redundancy analysis, variable clustering, and random projections, amongst others.

    You don't mention which program you're working in but I am going to presume it's Python. sklearn has implementations for PCA and SparseRandomProjection. I know there is a module designed for variable clustering in Python but I have not used it and don't know how convenient it is. I don't know if there's an unsupervised implementation of redundancy analysis in Python but you could consider making your own. Depending on what you decide to do it might not be too tricky (especially if you just do correlation based).

    In case you're working in R, finding versions of data reduction using PCA will be no problem. For variable clustering and redundancy analysis, great packages like Hmisc and ClustOfVar exist.

    You can also read about other unsupervised data reduction techniques; you might find other methods more suitable.