pythondata-visualizationsvmpyml

PyML: graphing the decision surface


PyML has a function for graphing decision surfaces.

First you need to tell PyML which data to use. Here I use a sparsevectordata with my feature vectors. This is the one I used to train my SVM.

demo2d.setData(training_vector)

Then you need to tell it which classifier you want to use. I give it a trained SVM.

demo2d.decisionSurface(best_svm, fileName = "dec.pdf")

However, I get this error message:

Traceback (most recent call last):
**deleted by The Unfun Cat**
    demo2d.decisionSurface(best_svm, fileName = "dec.pdf")
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/PyML/demo/demo2d.py", line 140, in decisionSurface
    results = classifier.test(gridData)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/PyML/evaluators/assess.py", line 45, in test
    classifier.verifyData(data)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/PyML/classifiers/baseClassifiers.py", line 55, in verifyData
    if len(misc.intersect(self.featureID, data.featureID)) != len(self.featureID) :
AttributeError: 'SVM' object has no attribute 'featureID'

Solution

  • I'm going to dive right into the source, because I have never used PyML. Tried to find it online, but I couldn't track down the verifyData method in the PyML 0.7.2 that was online, so I had to search through downloaded source.

    A classifier's featureID is only set in the baseClassifier class's train method (lines 77-78):

    if data.__class__.__name__ == 'VectorDataSet' :
            self.featureID = data.featureID[:]
    

    In your code, data.__class__.__name__ is evaluating to "SparseDataSet" (or what ever other class you are using) and the expression evaluates to False (never setting featureID).

    Then in demo2d.decisionSurface:

    gridData = VectorDataSet(gridX)
    gridData.attachKernel(data.kernel)
    results = classifier.test(gridData)
    

    Which tries to test your classifier using a VectorDataSet. In this instance classifier.test is equivalent to a call to the assess.test method which tries to verify if the data has the same features the training data had by using baseClassifier.verifyData:

    def verifyData(self, data) :
      if data.__class__.__name__ != 'VectorDataSet' :
          return
      if len(misc.intersect(self.featureID, data.featureID)) != len(self.featureID) :
           raise ValueError, 'missing features in test data'
    

    Which then tests the class of the passed data, which is now "VectorDataSet", and proceeds to try to access the featureID attribute that was never created.

    Basically, it's either a bug, or a hidden feature.

    Long story short, You have to convert your data to a VectorDataSet because SVM.featureID is not set otherwise.

    Also, you don't need to pass it a trained data set, the function trains the classifier for you.

    Edit:

    I would also like to bring attention to the setData method:

    def setData(data_) :
        global data
        data = data_
    

    There is no type-checking at all. So someone could potentially set data to anything, e.g. an integer, a string, etc., which will cause an error in decisionSurface.

    If you are going to use setData, you must use it carefully (only with a VectorDataSet), because the code is not as flexible as you would like.