Recently I was asked about the difference between PCA and SVD and I realized I could not answer it well. So today let's extend from my last post PCA and simply summarize the similarities and difference between PCA and SVD.
1. SVD for dimension reduction
1.1. SVD in PCA
First lets review how to use SVD to realize PCA:
SVD tells us that any arbitrary matrix \(X\) can be converted to an orthogonal matrix, a diagonal matrix and another orthogonal matrix: \[X = U \Sigma V^T\] Then: \[\begin{align*} C_X &= \frac{1}{n-1}X^TX \\&=(V \Sigma U^T)(U\Sigma V^T) \\ &= V\Sigma^2V^T \end{align*}\]
If we define \(P = V\), then: \[\begin{align*} C_Y &= (XP)^T(XP) \\ &=P^TP\Sigma U^T U \Sigma P^TP \\ &=\Sigma^2 \end{align*}\] Therefore, the columns of \(V\) are the loading vectors of \(X\)1.2 SVD itself performs dimension reduction
SVD can not only used to prove PCA, itself can also perform dimension reduction: we know SVD is: \[X_{(m,n)} = U_{(m,m)} \Sigma_{(m,n)} V^T_{(n,n)}\]
We found out that the sum of top 10% or 1% diagonal elements of matrix \(\Sigma_{(m,n)}\) which are singular values represent almost 99% of the sum of all singular values. In that case we can use only top k singular values and the corresponding \(U\) and \(V\) to re-express the original matrix.
2. Difference comparing to PCA
Purpose
- SVD is a method to purely decompose a matrix. It generalizes the eigen-decomposition of a square normal matrix to any \(m\times n\) matrix via an extension of the polar decomposition.
- PCA literally is a method for dimension reduction
Method
- PCA needs to calculate the covariance matrix which will take a lot of time when there are lots of data and features
- SVD can calculate right singular matrix V without calculating covariance matrix \(X^TX\), so that it will take less time than PCA using eigenvector decomposing method
- Also, you may notice that SVD only uses the right singular matrix \(V\) but not left one \(U\). Lets say we have \(X_{m,n}\), and we found the matrix \(U_{(m,d)}\) which gives the k-dimensional \(V\). If we do: \(X' = U^T_{(d,m)} X_{(m,n)}\) then it results in a \(d \times n\) matrix. It tells us that the left singular matrix \(U\) is for compressing the data in row level and the right singular matrix \(V\) is for compressing in column level. In another word, SVD could compress the data in two directions whereas PCA can only get the feature-wise dimension reduction.
Implementation
- SVD can deal with sparse-matrix better -> save more space for large matrix
- PCA needs to do normalization which will destroy the sparsity for its original matrix
Hopefully after reading this short article you can have a deeper understanding of PCA and SVD.
Reference:
Book: An Introduction to Statistical Learning
https://arxiv.org/abs/1404.1100
sklean.decomposition.PCA documentation https://en.wikipedia.org/wiki/Singular_value_decomposition https://www.cnblogs.com/bjwu/p/9358777.html