Note: Contents and examples in this article are partially from Scikit-learn-Preprocessing data and faqs.org-Should I normalize/standardize/rescale the data
Scaling
Scaling a vector means to add/substract a constant, then multiply/divide by another constant, so the features can lie between given minimum and maximum values. The motivation to use this scaling include robustness to very small standard deviations of features and preserving zero entries in sparse data. Normally, the given range is [0,1]
For example, if we have a dataset like below,
The scaling process will be
- take the first column vector x0 = [1, 2, 0]T;
- x0max=2 and x0max=0
- scale xi,0,
Therefore, the scaling x0 is [0.5, 1, 0]T. Repeating the same process for x1 and x2, the scaling dataset is
Standardizing
Standardization of dataset enables the individual feature look like standard normally distributed data: Gaussian with zero mean and unit variance.
where:
- x̄j is the mean of the vector, and
- σ is the standard deviation of the vector.
Let’t take X
as the example again. The standardizing process will be
- take the first column vector x0 = [1, 2, 0]T;
- calculate the mean of x̄0 = (1+2+0)/3 = 1
- calculate the standard deviation of x0,
- calculate
Therefore, the standardizing x0 is [0, 1.22, -1.22]T. Repeating the same process for x1 and x2, the standardizing dataset is
Normalizing
Normalizing a vector is the process of scaling vectors to have unit norm. The motivation is to qualify the similarity of any pair of vectors while using dot-product.
Let’t take X
as the example again. The scaling process will be
- take the first row vector x0 = [1, -1, 2];
- calculate norm of x0
- normalize x0,
Therefore, the normalizing x0 is [0.41, -0.41, 0.82]. Repeating the same process for x1 and x2, the normalizing dataset is