Note: Contents and examples in this article are partially from Scikit-learn-Preprocessing data and faqs.org-Should I normalize/standardize/rescale the data

## Scaling

**Scaling** a vector means to add/substract a constant, then multiply/divide by another constant, so the features can lie between given minimum and maximum values. The motivation to use this scaling include robustness to very small standard deviations of features and preserving zero entries in sparse data. Normally, the given range is [0,1]

For example, if we have a dataset like below,

The scaling process will be

- take the first column vector x
_{0}= [1, 2, 0]^{T}; - x
_{0}^{max}=2 and x_{0}^{max}=0 - scale x
_{i,0},

Therefore, the scaling x_{0} is [0.5, 1, 0]^{T}. Repeating the same process for x_{1} and x_{2}, the scaling dataset is

## Standardizing

**Standardization** of dataset enables the individual feature look like standard normally distributed data: Gaussian with zero mean and unit variance.

where:

- x̄
_{j}is the mean of the vector, and - σ is the standard deviation of the vector.

Let’t take `X`

as the example again. The standardizing process will be

- take the first column vector x
_{0}= [1, 2, 0]^{T}; - calculate the mean of x̄
_{0}= (1+2+0)/3 = 1 - calculate the standard deviation of x
_{0},

- calculate

Therefore, the standardizing x_{0} is [0, 1.22, -1.22]^{T}. Repeating the same process for x_{1} and x_{2}, the standardizing dataset is

## Normalizing

**Normalizing** a vector is the process of scaling vectors to have unit norm. The motivation is to qualify the similarity of any pair of vectors while using dot-product.

Let’t take `X`

as the example again. The scaling process will be

- take the first
**row**vector x_{0}= [1, -1, 2]; - calculate norm of x
_{0}

- normalize x
_{0},

Therefore, the normalizing x_{0} is [0.41, -0.41, 0.82]. Repeating the same process for x_{1} and x_{2}, the normalizing dataset is