SPECTRAL CLUSTERING
Spectral clustering is a class of clustering algorithms that use linear algebra to identify groups of data points.
The first step in spectral clustering is to construct a similarity graph. Let x1, ..., xnbe a set of data points. The similarity graph, G=(V,E), can then be constructed with each vertex virepresenting one data point xiand edges between points viand vjweighted by their similarity score, si,j. Generally all edges with si,j>0are included, but a higher threshold can also be used. Various methods exist for calculating similarity scores. Two examples are the ε-neighborhood, where si,j=1if xi-xj< and si,j=0otherwise, and k-nearest neighbors, where si,j=1if vjis among the k-nearest neighbors of viand si,j=0otherwise.
Next, a graph Laplacian is used to transform the adjacency matrix, W=wi,ji,j=1,...,n, of the similarity graph. Various graph Laplacians have been developed in the literature on spectral graph theory. These include the unnormalized graph Laplacian and normalized graph Laplacians [8].
The next step is to calculate the first keigenvectors, u1, ..., unof Land assemble these into a matrix Uwhere the eigenvectors are the columns. (In some algorithms, the generalized eigenvectors are computed instead.)
Finally, for i=1,...,nlet yiℝkbe the vector corresponding to the ith row of U. Then cluster the points yii=1,...,nin ℝkwith the k-means algorithm into clusters C1,...,Ck.
The advantages of spectral clustering are that it avoids strong assumptions about the shape of clusters and becomes a linear problem once the similarity graph has been chosen [7]. As mentioned above, k-means works best on clusters that are “blob-like”. The transformations involved in spectral clustering allow more complex geometries, such as rings or even intertwined spirals to be identified. Computationally, spectral clustering algorithms are easy to implement and are not sensitive to local minima or initial conditions.
The disadvantages of spectral clustering are in constructing a good similarity graph and choosing an appropriate value of k. Constructing a similarity graph is nontrivial and the structure of the similarity graph has been shown to impact clustering convergence. The difficulty in choosing k is largely the same as with k-means.
Last updated