# Category Archives: machine learning

## [FWD] Two stories from a research paper: Content Without Context is Meaningles

Two stories from a research paper: Content Without Context is Meaningless.

### 1.1 Machine Learning Hammer

Mark Twain once said: “To a man with a hammer, everything looks like a nail.” His observation is definitely very relevant to current trends in content analysis. We have a Machine Learning Hammer (ML Hammer) that we want to use for solving any problem that needs to be solved. The problem is neither with learning nor with the hammer; the problem is with people who fail to learn that not every problem is a new learning problem [1]. … If we can identify such a feature set, then we can easily model each object by its appropriate feature values. The challenges are

## [FWD] Experiences and Lessons in Developing Machine Learning and Data Mining Software

I came across this slides which talks about basic and useful experience of using and developing a machine learning package. I found examples on slide 13-19 and 31-33 are very thoughtful, mainly because I suffered from the same problem recently.

The presenter is the author of LibSVM.

## Tikz example – Kernel trick

In Support Vector Machines, the learning algorithms can only solve linearly separable problems. However, this isn’t strictly true. Since all feature vectors only occurred in dot-products k(xi,xj)= xi·xj, the “kernel trick” can be applied, by replacing dot-products by another kernel (Boser et al., 1992). A more formal statement of kernel trick is that

Given an algorithm which is formulated in terms of a positive definite kernel k, one can construct an alternative algorithm by replacing k by another positive definite kernel k∗ (Schlkopf and Smola, 2002).

The best known application of the kernel trick is in the case where k is the dot-product, but the trick is not limited to that case: both k and k can be nonlinear kernels. More general, given any feature map φ from observations into a inner product space, we obtain a kernel k(xi,xj)=φ(xi)·φ(xj).

This figure was drawn for “kernel trick” with samples from two classes.

## Recommend: A Course in Machine Learning

The following content is totally copied from the website of A Course in Machine Learning.

CIML is a set of introductory materials that covers most major aspects of modern machine learning (supervised learning, unsupervised learning, large margin methods, probabilistic modeling, learning theory, etc.). It’s focus is on broad applications with a rigorous backbone. A subset can be used for an undergraduate course; a graduate course could probably cover the entire material and then some.

## Tikz example – SVM trained with samples from two classes

In machine learning, Support Vector Machines are supervised learning models used for classification and regression analysis. The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output, making it a non-probabilistic binary linear classifier. To classify examples, we choose the hyperplane so that the distance from it to the nearest data point on each side is maximized. If such a hyperplane exists, it is known as the maximum-margin hyperplane and the linear classifier it defines is known as a maximum margin classifier.

## Book review: Introduction to Machine Learning (2ed)

Introduction to Machine Learning (2ed), by Ethem Alpaydin, MIT Press, 2010. ISBN 0-262-01243-X.

This book provides students, researchers, and developers a comprehensive introduction to the machine learning techniques. It is structured primarily as coursebook, which is a valuable teaching textbook for graduates or undergraduates. This book is also a good resources for self-study by researches and developers, but they have to be familiar with AI and advanced mathematics.

This book begins with an introduction chapter, followed by 18 chapters plus an appendix. Each chapter presents a stand-alone topic, beginning with a brief introduction and ending with notes. Therefore, the readers can quickly obtain an overview for the topic and catch the possible direction to further development in this subject area. The book covers a variety of machine learning techniques: supervised and unsupervised learning, parametric and nonparametric methods. All of these are followed by methods of how to assess and compare classification algorithms, combine multiple learners, and reinforce learning procedure.

## Scale, Standardize, and Normalize Data

Note: Contents and examples in this article are partially from Scikit-learn-Preprocessing data and faqs.org-Should I normalize/standardize/rescale the data

## Scaling

Scaling a vector means to add/substract a constant, then multiply/divide by another constant, so the features can lie between given minimum and maximum values. The motivation to use this scaling include robustness to very small standard deviations of features and preserving zero entries in sparse data. Normally, the given range is [0,1]

$x^*_{i,j}=\frac{x_{i,j}-x^{min}_{j}}{x^{max}_{j}-x^{min}_{j}}$

For example, if we have a dataset like below,