### Spectral graph embedding doesn’t work on an adjacency matrix

I’ve heard several talks at conferences in the past few weeks where someone has run an eigendecomposition or SVD on an adjacency matrix and assumed that the embedding they end up with is meaningful. Some of them noticed that this embedding didn’t represent their graph very well. There’s a simple explanation for that — it’s wrong. In this post I’ll try and explain why.

For a graph with n nodes, an adjacency matrix is an nxn matrix whose ijth entry represents the weight of the edge connecting node i and node j. The entries of this matrix are all non-negative and the matrix must be symmetric. (There are ways to handle non-symmetric matrices, that is, directed graphs, but they require significantly more care to embed appropriately.)

Now remember, eigendecompositions or SVDs are numeric algorithms that don’t know that the content of this matrix represents a graph. They regard the rows of the adjacency matrix as vectors in an n-dimensional vector space — and this view does not fit very well with the graph that this matrix is representing. For example, a well-connected node in the graph has a corresponding row with many non-zero entries; as a vector, then, it is quite long and so the point corresponding to its end is far from the origin. A poorly connected node, on the other hand, has mostly zero entries in its row, so it corresponds to a short vector. All of the entries of the adjacency matrix are non-negative, so all of these vectors are in the positive hyperquadrant.

The cloud of points corresponding to the graph therefore looks like this figure:

graphcloud

where the red area represents the well-connected nodes of the graph. The eigendecomposition/SVD of this cloud corresponds to a rotation to new axes and (usually) a projection to a lower-dimensional space. There are several problems with this.

First, the well-connected nodes are on the outside of the cloud, but they should be in the middle — they are important and so should be central. Second, the well-connected nodes should be close to one another in general but they are spread along the outer shell of the cloud. In other words, the cloud derived from the adjacency matrix is inside-out with respect to the natural and expected structure of the graph. Any embedding derived from this cloud is going to inherit its inside-out structure and so will be close to useless.

There is also an equally serious issue: the direction of the first eigenvector of such a cloud will be the vector from the origin to ‘the center of the cloud’ because the numerically greatest variation is between the origin and this center. This vector is shown as black in the figure. So far so good: projection onto this vector does indeed provide an importance ranking for the graph nodes, with the most important projected onto the end away from the origin.

However, the second and subsequent axes are necessarily orthogonal to this first axis — but directions orthogonal to it do not tell us anything about the variation within the cloud. If we took exactly the same shaped cloud and moved it a little in the positive hyperquadrant, the first axis would change, forcing changes in all of the other axes, but the shape of the cloud has not changed! In other words, all of the axes after the first are meaningless as measures of variation in the graph.

The right way to embed a graph is to convert the adjacency matrix to one of several Laplacian matrices. This conversion has the effect of centering the cloud around the origin so that the eigendecomposition/SVD now finds the axes in which the cloud varies, and so gives you the embedding you want.