Kernel Functions

Below is a list of some kernel functions available from the existing literature. As was the case with previous articles, every LaTeX notation for the formulas below are readily available from their alternate text html tag. I can not guarantee all of them are perfectly correct, thus use them at your own risk. Most of them have links to articles where they have been originally used or proposed.

1. Linear Kernel

The Linear kernel is the simplest kernel function. It is given by the inner product <x,y> plus an optional constant c. Kernel algorithms using a linear kernel are often equivalent to their non-kernel counterparts, i.e. KPCA with linear kernel is the same as standard PCA.

核函数-Kernel Function汇总_核函数-Kernel Function汇

2. Polynomial Kernel

The Polynomial kernel is a non-stationary kernel. Polynomial kernels are well suited for problems where all the training data is normalized.

核函数-Kernel Function汇总_核函数-Kernel Function汇_02
Adjustable parameters are the slope alpha, the constant term c and the polynomial degree d.

3. Gaussian Kernel

The Gaussian kernel is an example of radial basis function kernel.

核函数-Kernel Function汇总_核函数-Kernel Function汇_03

Alternatively, it could also be implemented using

核函数-Kernel Function汇总_核函数-Kernel Function汇_04

The adjustable parameter sigma plays a major role in the performance of the kernel, and should be carefully tuned to the problem at hand. If overestimated, the exponential will behave almost linearly and the higher-dimensional projection will start to lose its non-linear power. In the other hand, if underestimated, the function will lack regularization and the decision boundary will be highly sensitive to noise in training data.

4. Exponential Kernel

The exponential kernel is closely related to the Gaussian kernel, with only the square of the norm left out. It is also a radial basis function kernel.

核函数-Kernel Function汇总_核函数-Kernel Function汇_05

5. Laplacian Kernel

The Laplace Kernel is completely equivalent to the exponential kernel, except for being less sensitive for changes in the sigma parameter. Being equivalent, it is also a radial basis function kernel.

核函数-Kernel Function汇总_核函数-Kernel Function汇_06

It is important to note that the observations made about the sigma parameter for the Gaussian kernel also apply to the Exponential and Laplacian kernels.

6. ANOVA Kernel

The ANOVA kernel is also a radial basis function kernel, just as the Gaussian and Laplacian kernels. It is said to perform well in multidimensional regression problems (Hofmann, 2008).

核函数-Kernel Function汇总_核函数-Kernel Function汇_07

7. Hyperbolic Tangent (Sigmoid) Kernel

The Hyperbolic Tangent Kernel is also known as the Sigmoid Kernel and as the Multilayer Perceptron (MLP) kernel. The Sigmoid Kernel comes from the Neural Networks field, where the bipolar sigmoid function is often used as an activation function for artificial neurons.

核函数-Kernel Function汇总_核函数-Kernel Function汇_08

It is interesting to note that a SVM model using a sigmoid kernel function is equivalent to a two-layer, perceptron neural network. This kernel was quite popular for support vector machines due to its origin from neural network theory. Also, despite being only conditionally positive definite, it has been found to perform well in practice.

There are two adjustable parameters in the sigmoid kernel, the slope alpha and the intercept constant c. A common value for alpha is 1/N, where N is the data dimension. A more detailed study on sigmoid kernels can be found in the works by Hsuan-Tien and Chih-Jen.

8. Rational Quadratic Kernel

The Rational Quadratic kernel is less computationally intensive than the Gaussian kernel and can be used as an alternative when using the Gaussian becomes too expensive.

核函数-Kernel Function汇总_核函数-Kernel Function汇_09

9. Multiquadric Kernel

The Multiquadric kernel can be used in the same situations as the Rational Quadratic kernel. As is the case with the Sigmoid kernel, it is also an example of an non-positive definite kernel.

核函数-Kernel Function汇总_核函数-Kernel Function汇_10

10. Inverse Multiquadric Kernel

The Inverse Multi Quadric kernel. As with the Gaussian kernel, it results in a kernel matrix with full rank (Micchelli, 1986) and thus forms a infinite dimension feature space.

核函数-Kernel Function汇总_核函数-Kernel Function汇_11

11. Circular Kernel

The circular kernel comes from a statistics perspective. It is an example of an isotropic stationary kernel and is positive definite in R2.

核函数-Kernel Function汇总_核函数-Kernel Function汇_12
核函数-Kernel Function汇总_核函数-Kernel Function汇_13

12. Spherical Kernel

The spherical kernel is similar to the circular kernel, but is positive definite in R3.

核函数-Kernel Function汇总_核函数-Kernel Function汇_14

核函数-Kernel Function汇总_核函数-Kernel Function汇_13

13. Wave Kernel

The Wave kernel is also symmetric positive semi-definite (Huang, 2008).

核函数-Kernel Function汇总_核函数-Kernel Function汇_16

14. Power Kernel

The Power kernel is also known as the (unrectified) triangular kernel. It is an example of scale-invariant kernel (Sahbi and Fleuret, 2004) and is also only conditionally positive definite.

核函数-Kernel Function汇总_核函数-Kernel Function汇_17

15. Log Kernel

The Log kernel seems to be particularly interesting for images, but is only conditionally positive definite.

核函数-Kernel Function汇总_核函数-Kernel Function汇_18

16. Spline Kernel

The Spline kernel is given as a piece-wise cubic polynomial, as derived in the works by Gunn (1998).

核函数-Kernel Function汇总_核函数-Kernel Function汇_19

However, what it actually mean is:

核函数-Kernel Function汇总_核函数-Kernel Function汇_20

With核函数-Kernel Function汇总_核函数-Kernel Function汇_21

17. B-Spline (Radial Basis Function) Kernel

The B-Spline kernel is defined on the interval [−1, 1]. It is given by the recursive formula:

核函数-Kernel Function汇总_核函数-Kernel Function汇_22

核函数-Kernel Function汇总_核函数-Kernel Function汇_23

In the work by Bart Hamers it is given by:核函数-Kernel Function汇总_核函数-Kernel Function汇_24

Alternatively, Bn can be computed using the explicit expression (Fomel, 2000):

核函数-Kernel Function汇总_核函数-Kernel Function汇_25

Where x+ is defined as the truncated power function:

核函数-Kernel Function汇总_核函数-Kernel Function汇_26

18. Bessel Kernel

The Bessel kernel is well known in the theory of function spaces of fractional smoothness. It is given by:

核函数-Kernel Function汇总_核函数-Kernel Function汇_27

where J is the Bessel function of first kind. However, in the Kernlab for R documentation, the Bessel kernel is said to be:

核函数-Kernel Function汇总_核函数-Kernel Function汇_28

19. Cauchy Kernel

The Cauchy kernel comes from the Cauchy distribution (Basak, 2008). It is a long-tailed kernel and can be used to give long-range influence and sensitivity over the high dimension space.

核函数-Kernel Function汇总_核函数-Kernel Function汇_29

20. Chi-Square Kernel

The Chi-Square kernel comes from the Chi-Square distribution.

核函数-Kernel Function汇总_核函数-Kernel Function汇_30

21. Histogram Intersection Kernel

The Histogram Intersection Kernel is also known as the Min Kernel and has been proven useful in image classification.

核函数-Kernel Function汇总_核函数-Kernel Function汇_31

22. Generalized Histogram Intersection

The Generalized Histogram Intersection kernel is built based on the Histogram Intersection Kernel for image classification but applies in a much larger variety of contexts (Boughorbel, 2005). It is given by:

核函数-Kernel Function汇总_核函数-Kernel Function汇_32

23. Generalized T-Student Kernel

The Generalized T-Student Kernel has been proven to be a Mercel Kernel, thus having a positive semi-definite Kernel matrix (Boughorbel, 2004). It is given by:

核函数-Kernel Function汇总_核函数-Kernel Function汇_33

24. Bayesian Kernel

The Bayesian kernel could be given as:

核函数-Kernel Function汇总_核函数-Kernel Function汇_34

where

核函数-Kernel Function汇总_核函数-Kernel Function汇_35

However, it really depends on the problem being modeled. For more information, please see the work by Alashwal, Deris and Othman, in which they used a SVM with Bayesian kernels in the prediction of protein-protein interactions.

25. Wavelet Kernel

The Wavelet kernel (Zhang et al, 2004) comes from Wavelet theory and is given as:

核函数-Kernel Function汇总_核函数-Kernel Function汇_36

Where a and c are the wavelet dilation and translation coefficients, respectively (the form presented above is a simplification, please see the original paper for details). A translation-invariant version of this kernel can be given as:

核函数-Kernel Function汇总_核函数-Kernel Function汇_37

Where in both h(x) denotes a mother wavelet function. In the paper by Li Zhang, Weida Zhou, and Licheng Jiao, the authors suggests a possible h(x) as:

核函数-Kernel Function汇总_核函数-Kernel Function汇_38

Which they also prove as an admissible kernel function.