[SOLVED] Designing a Kernel for a support vector machine (XOR)

Designing a Kernel for a support vector machine (XOR)

The meat of my question is "how does one design a kernel function for a learning problem?"

As a quick background, I'm reading books on support vector machines and kernel machines, and everywhere I look authors give examples of kernels (polynomial kernels both homogeneous and nonhomogeneous, gaussian kernels, and allusions to text-based kernels to name a few), but all either provide pictures of the results without specifying the kernel, or vaguely claim that "an efficient kernel can be constructed". I'm interested in the process that goes on when one designs a kernel for a new problem.

Probably the easiest example is learning XOR, a smallest (4 points) non-linear data set as embedded the real plane. How would one come up with a natural (and non-trivial) kernel to linearly separate this data?

As a more complex example (see Cristianini, Introduction to SVMs, figure 6.2), how would one design a kernel to learn a checkerboard pattern? Cristianini states the picture was derived "using Gaussian kernels" but it seems that he uses multiple, and they are combined and modified in an unspecified way.

If this question is too broad to answer here, I'd appreciate a reference to the construction of one such kernel function, though I'd prefer the example be somewhat simple.

Solution

Q: "How does one design a kernel function for a learning problem?"

A: "Very carefully"

Trying the usual suspects (linear, polynomial, RBF) and using whichever works the best really is sound advice for someone trying to get the most accurate predictive model they can. For what it's worth it's a common criticism of SVMs that they seem to have a lot of parameters that you need to tune empirically. So at least you're not alone.

If you really want to design a kernel for a specific problem then you are right, it is a machine learning problem all in itself. It's called the 'model selection problem'. I'm not exactly an expert myself here, but the best source of insight into kernel methods for me was the book 'Gaussian Processes' by Rasumussen and Williams (it's freely available online), particularly chapters 4 and 5. I'm sorry that I can't say much more than 'read this huge book full of maths' but it's a complicated problem and they do a really good job of explaining it.