In Support Vector Machines, the learning algorithms can only solve linearly separable problems. However, this isn’t strictly true. Since all feature vectors only occurred in dot-products k(xi,xj)= xi·xj
, the “kernel trick” can be applied, by replacing dot-products by another kernel (Boser et al., 1992). A more formal statement of kernel trick is that
Given an algorithm which is formulated in terms of a positive definite kernel k, one can construct an alternative algorithm by replacing k by another positive definite kernel k∗ (Schlkopf and Smola, 2002).
The best known application of the kernel trick is in the case where k is the dot-product, but the trick is not limited to that case: both k and k∗ can be nonlinear kernels. More general, given any feature map φ from observations into a inner product space, we obtain a kernel k(xi,xj)=φ(xi)·φ(xj)
.
This figure was drawn for “kernel trick” with samples from two classes. The dot product is replaced by a nonlinear kernel function
φ. This allows the algorithm to fit the maximum-margin hyperplane in a transformed feature space.
\documentclass[10pt,letterpaper]{article} \usepackage{tikz} \usetikzlibrary{arrows} \usepackage[active,tightpage,pdftex]{preview} \PreviewEnvironment{tikzpicture} \begin{document} \begin{tikzpicture}[>=stealth',x=1cm,y=1cm] %draw[color=gray] (0,0) grid (6,6); \draw (0,0) rectangle (6,6); % \draw line \draw[color=red,line width=2pt] (2,6) .. controls (3,5.5) and (3,5) .. (3,5) .. controls (3,4) and (2,2.5) .. (2,2) .. controls (2,1) and (2.8,1) .. (3,1) .. controls (3.5,1) and (3.5,2) .. (4,2) .. controls (4.5,2) and (6,0) .. (6,0); % \draw left dashed line \draw[dashed] (1.5,6) .. controls (2.5,5.5) and (2.5,5) .. (2.5,5) .. controls (2.5,4) and (1.5,2.5) .. (1.5,2) .. controls (1.5,.5) and (2.8,.5) .. (3,.5) .. controls (3.75,.5) and (3.5,1.5) .. (4,1.5) .. controls (4.5,1.5) and (5.5,0) .. (5.5,0); % \draw right dashed line \draw[dashed] (2.5,6) .. controls (3.5,5.5) and (3.5,5) .. (3.5,5) .. controls (3.5,4) and (2.5,2.5) .. (2.5,2) .. controls (2.5,1.5) and (2.8,1.5) .. (3,1.5) .. controls (3.25,1.5) and (3.5,2.5) .. (4,2.5) .. controls (4.5,2.5) and (6,0.5) .. (6,0.5); %\draw[color=gray] (2,6) -- (3,5) -- (2,2) -- (3,1) -- (4,2) -- (6,0); %\draw[color=gray] (1.5,6) -- (2.5,5) -- (1.5,2) -- (3,.5)-- (4,1.5)-- (5.5,0); %\draw[color=gray] (2.5,6) -- (3.5,5) -- (2.5,2) -- (3,1.5)-- (4,2.5)-- (6,0.5); %\draw[color=gray] (7,0) grid (13,6); \draw (7,0) rectangle (13,6); % \draw line \draw[color=red,line width=2pt] (8.5,6) -- (12,0); % \draw dashed line \draw[dashed] (8,6) -- (11.5,0); \draw[dashed] (9,6) -- (12.5,0); \draw[->,thick] (5,3) -- (8,3) node [above,pos=.5] {$\phi$}; \def\positive{{% {2.3,5.3}, {3.5,.7}, {1.5,2}, {1.2,2.1}, {1.8,.8}, {1,5.5}, {1.2,5.8}, {.75,.2}, {2,4}, {5, 0.5}, {1.5,3}, {2.3,.5}, % {9.3,3.3}, {11,.8}, {8.5,2}, {7.2,4.1}, {8.8,.8}, {8,5.5}, {8.2,5}, {7.75,.2}, {9,4.2}, {12, 0.5}, {8.5,3}, {9.3,.5}, }} % \draw positive dots \foreach \i in {0,...,20} { \pgfmathparse{\positive[\i][0]}\let \x \pgfmathresult; \pgfmathparse{\positive[\i][1]}\let \y \pgfmathresult; \fill[black] (\x,\y) circle (2pt); } \def\negative{{% {4,2.5}, {3.5,5}, {2.6,1.6}, {4.5,5.2}, {5.5,3.7}, {3.9,4.7}, {5,2.7}, {3.5,4.2}, {5.8,.9}, % {10.75,3}, {10.5,5}, {11.6,1.6}, {11.5,5.2}, {12.5,3.7}, {10.9,4.7}, {12,2.7}, {10.5,4.2}, {12.8,.9}, }} % \draw negative dots \foreach \i in {0,...,16} { \pgfmathparse{\negative[\i][0]}\let \x \pgfmathresult; \pgfmathparse{\negative[\i][1]}\let \y \pgfmathresult; \draw[black] (\x,\y) circle (3pt); } \end{tikzpicture} \end{document}
Reference
- Bernhard E. Boser, Isabelle M. Guyon, and Vladimir N. Vapnik. 1992. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152. ACM Press.
- Bernhard Schlkopf and Alexander J. Smola. 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning). The MIT press, Cambridge, MA, USA.
Thank you for your work.
There are some problems with your code display. Greater than symbols and backslashes are not displayed.
I would also suggest to use values like 5.9 and 0.1 instead of 6 and 0 for the hyperplanes, because than they dont go over the boxlines.
Best Wishes,
Moritz