How Does It Work?

The goal of PCA method is to find out the principal components of the dataset. The principal components are the values of this dataset projected into the principal directions. The direction which has the largest variance of projected value is taken as the first principal direction, then is the second principal direction and so on. Each principal direction is orthogonal to each other.

Therefore, PCA method can be explained by answering the following two questions, a) what is the principal components and principal directions, b) how to find out the principal components and principal directions

Principal Components

The first principal component is defined in such a way that if we project the dataset into a vector linearly, the projected value of the samples has the largest possible variance, then the projected value of each sample is called principal component, and this vector is the principal direction. By making the variance of projected values as large as possible, we can guarantee that samples are spitted as much as possible in this direction. Each succeeding principal components in turn has the largest projected variance under the constraint that this principal component is orthogonal to the preceding components.

Principal Components Calculation

There are many approaches to calculate the principal components of the dataset, such as gradient descent method, eigen-decomposition of covariance/correlation matrix method, singular value decomposition of covariance/correlation method and so on(Shamir, 2016, Smith, 2002, Wall et al., 2003). In this section, we will give a step by step tutorial of the eigen-decomposition method and singular value decomposition method. The last method is also famous for its high efficiency and is widely used by different machine learning packages.

Step 1. Data preprocessing – standardization

Standardizing the data prior to the PCA process is crucial for most dataset. By subtracting the mean value and dividing the standard deviation of each column, each feature of the dataset can be transformed into unit scale (mean=0 and variance =1). This process is quite useful especially when the features are measured on different scales. Because if one feature’s scale is much more than others, the variance of this feature will significantly larger than others, which made this feature has a higher weight for the calculation of principal components. Mathematically, each data in the dataset can be standarized by the following equation:

where is the standardized value of th sample th feature; is the data of the dataset; is the mean value of th feature of the dataset; is the variance of the th feature of the dataset.

Step 2, Covariance/correlation matrix Calculation

The classic approach to PCA is applying eigen-decomposition on its covariance /correlation matrix. The eigenvectors are the principal directions, while the eigenvalues represent the variance of the dataset along this vector. In other words, the eigenvector with highest eigenvalue is the first principal direction. Then the second largest eigenvalue corresponds to the second principal direction and so on.

Step 3, Eigen-decomposition of the covariance matrix or correlation matrix.

Since both covariance matrix and correlation matrix are symmetric matrices with a dimension . The eigen decomposition for real symmetric matrices can be represented as

Different from the eigen decomposition of covariance/correlation matrix, the SVD method can compute the principal directions and principal components at the same time, which is very efficient for big dataset computation.

PreviousPrincipal Component Analysis NextInterpretation of PCA result

Last updated 1 year ago