rsvmlibsvmkernlab

Equation of rbfKernel in kernlab is different from the standard?


I have observed that kernlab uses rbfkernel as,

rbf(x,y) = exp(-sigma * euclideanNorm(x-y)^2)

but according to this wiki link, the rbf kernel should be of the form

rbf(x,y) = exp(-euclideanNorm(x-y)^2/(2*sigma^2))

which is also more intuitive since two close samples with a large kernel sigma value will lead to a higher similarity matching.

I am not sure what e1071 svm uses (native code libsvm?)

I hope someone can enlighten me on why there is a difference ? I caught this because I was initially using e1071 but switched to ksvm but saw inconsistent results for the two. A small example for comparison

set.seed(123)
x <- rnorm(3)
y <- rnorm(3)
sigma <- 100

rbf <- rbfdot(sigma=sigma)
rbf(x, y)
exp( -sum((x-y)^2)/(2*sigma^2) )

I would expect the kernel value to be close to 1 (since x,y come from sigma=1, while kernel sigma=100). This is observed only in the second case.


Solution

  • I came across that discrepancy too and I wound up digging into the source to figure out if there was a typo in the documentation or what was going on exactly since sigma in the context of Gaussians traditionally goes as the standard deviation in the denominator right?

    Here's the relevant source

    **kernlab\R\kernels.R**
    ## Define the kernel objects,
    ## functions with an additional slot for the kernel parameter list.
    ## kernel functions take two vector arguments and return a scalar (dot product)
    
    
    rbfdot<- function(sigma=1)
      {
    
        rval <- function(x,y=NULL)
        {
          if(!is(x,"vector")) stop("x must be a vector")
          if(!is(y,"vector")&&!is.null(y)) stop("y must a vector")
          if (is(x,"vector") && is.null(y)){
            return(1)
          }
          if (is(x,"vector") && is(y,"vector")){
            if (!length(x)==length(y))
              stop("number of dimension must be the same on both data points")
            return(exp(sigma*(2*crossprod(x,y) - crossprod(x) - crossprod(y))))  
            # sigma/2 or sigma ??
          }
        }
         return(new("rbfkernel",.Data=rval,kpar=list(sigma=sigma)))
      }
    

    You can observe from their comment on sigma/2 or sigma ?? that they may perhaps be a bit confused about the convention to adopt, the presence of /2 would be consistent with the standard deviation form /(2*sigma), but I had to speculate about this discovery.

    Now another corroborating piece of evidence is in the help page for ? rbfdot which reads...

    sigma The inverse kernel width used by the Gaussian the Laplacian, the Bessel and the ANOVA kernel

    And that is consistent with the form they use with sigma in the numerator, since in the denominator it would scale proportionately with the width of the Gaussian right. So it indeed looks like they settled on the convention that is described in the Wikipedia article as the gamma form, where they say

    An equivalent, but simpler, definition involves a parameter gamma = -1/(2*sigma^2)

    So the difference just seems to be a matter of adopting different but equivalent conventions. One motivator for the particular convention (which someone may confirm in a comment) may arise from issues of code reuse and consistency, where as you see the parameter is used by three other kernel forms that may have their parameters more traditionally set in the numerator. I'm not sure on that point however since I've never used those alternate kernels and am unfamiliar with each.