In Support Vector Machines, the learning algorithms can only solve linearly separable problems. However, this isn’t strictly true. Since all feature vectors only occurred in dot-products
k(xi,xj)= xi·xj, the “kernel trick” can be applied, by replacing dot-products by another kernel (Boser et al., 1992). A more formal statement of kernel trick is that
Given an algorithm which is formulated in terms of a positive definite kernel k, one can construct an alternative algorithm by replacing k by another positive definite kernel k∗ (Schlkopf and Smola, 2002).
The best known application of the kernel trick is in the case where k is the dot-product, but the trick is not limited to that case: both k and k∗ can be nonlinear kernels. More general, given any feature map φ from observations into a inner product space, we obtain a kernel
This figure was drawn for “kernel trick” with samples from two classes.