## Posts Tagged 'spectral graph theory'

### Spectral graph embedding doesn’t work on an adjacency matrix

I’ve heard several talks at conferences in the past few weeks where someone has run an eigendecomposition or SVD on an adjacency matrix and assumed that the embedding they end up with is meaningful. Some of them noticed that this embedding didn’t represent their graph very well. There’s a simple explanation for that — it’s wrong. In this post I’ll try and explain why.

For a graph with n nodes, an adjacency matrix is an nxn matrix whose ijth entry represents the weight of the edge connecting node i and node j. The entries of this matrix are all non-negative and the matrix must be symmetric. (There are ways to handle non-symmetric matrices, that is, directed graphs, but they require significantly more care to embed appropriately.)

Now remember, eigendecompositions or SVDs are numeric algorithms that don’t know that the content of this matrix represents a graph. They regard the rows of the adjacency matrix as vectors in an n-dimensional vector space — and this view does not fit very well with the graph that this matrix is representing. For example, a well-connected node in the graph has a corresponding row with many non-zero entries; as a vector, then, it is quite long and so the point corresponding to its end is far from the origin. A poorly connected node, on the other hand, has mostly zero entries in its row, so it corresponds to a short vector. All of the entries of the adjacency matrix are non-negative, so all of these vectors are in the positive hyperquadrant.

The cloud of points corresponding to the graph therefore looks like this figure:

graphcloud

where the red area represents the well-connected nodes of the graph. The eigendecomposition/SVD of this cloud corresponds to a rotation to new axes and (usually) a projection to a lower-dimensional space. There are several problems with this.

First, the well-connected nodes are on the outside of the cloud, but they should be in the middle — they are important and so should be central. Second, the well-connected nodes should be close to one another in general but they are spread along the outer shell of the cloud. In other words, the cloud derived from the adjacency matrix is inside-out with respect to the natural and expected structure of the graph. Any embedding derived from this cloud is going to inherit its inside-out structure and so will be close to useless.

There is also an equally serious issue: the direction of the first eigenvector of such a cloud will be the vector from the origin to ‘the center of the cloud’ because the numerically greatest variation is between the origin and this center. This vector is shown as black in the figure. So far so good: projection onto this vector does indeed provide an importance ranking for the graph nodes, with the most important projected onto the end away from the origin.

However, the second and subsequent axes are necessarily orthogonal to this first axis — but directions orthogonal to it do not tell us anything about the variation within the cloud. If we took exactly the same shaped cloud and moved it a little in the positive hyperquadrant, the first axis would change, forcing changes in all of the other axes, but the shape of the cloud has not changed! In other words, all of the axes after the first are meaningless as measures of variation in the graph.

The right way to embed a graph is to convert the adjacency matrix to one of several Laplacian matrices. This conversion has the effect of centering the cloud around the origin so that the eigendecomposition/SVD now finds the axes in which the cloud varies, and so gives you the embedding you want.

### A Gentle Introduction to Finding the Most Important Node in Graph-Connected Data

I’ve posted frequently about the problem that analysts face when they are dealing with a set of objects, and connections between pairs of objects. For example, the nodes could be people, and the edges some measure of how much each pair of people communicate; the nodes could be reports and the edges how much each pair of reports mention the same entities; the nodes could be computers and the edges how well connected they are to each other. In other words, the natural data structure to represent the set of knowledge is a graph.

Very often such graphs are large, with thousands of nodes, and thousands of edges. The problem the analyst has is to work out which nodes of the graph deserve the most attention; often most of the graph is boring, or at least it is already well understood.

In what follows, I’ll assume that nodes by themselves have no associated score, but there is some natural way to associate a score with the edges that connect them. Let’s see how such a graph could be built, and so think about how we would keep track of which bits of it might be most important.

If the graph consists of only a single node, there’s not much to say. So consider what happens when a second node is added, and let’s assume that the second node is connected to the first (by a weighted edge). If they aren’t connected, then we might treat it as two separate graphs and two separate analysis problems. The only way to represent the graph with two nodes is with the single edge between them, but it might be plausible to make the length of the edge inversely proportional to the weight of the edge — if the weight is large, there’s a strong connection between them and so the nodes might plausibly be close together. We start to insert some geometry into the graph relationship because geometry is reasonably intuitive to us humans.

Now if we add a third node, there are two possibilities, if we want to continue to make edge length inversely proportional to edge weights (strong = close). If the third node is connected to both of the current two nodes, then the resulting structure is a triangle (whose exact shape depends on the relative size of the edge weights). If the third node is connected to only one of the current nodes, then the shape is a path.

The worst case is that adding each new node requires us to expand into another dimension: 3 nodes needs 2 dimensions, 4 nodes needs 3 dimensions, and so on, if we want to preserve the relationship between edge length and edge weight.

In this representation, nodes that are important because they are strongly connected to other important nodes (note the circularity) tend to be pulled towards the “middle” of the graph structure. So if we want to find the most important nodes, we should look near the middle.

But we don’t have that much geometry so, although the idea of the middle is intuitive, finding the middle is rather difficult. One approach is widely used in Social Network Analysis — the middle can (sort of) be found by computing how far it is from each node to the outer surface of the graph, and then looking at those nodes that are farthest inside. This works, after a fashion, but the algorithms can be complex and they tend not to tell you much else about the graph structure.

What about trying to use geometry directly and embedding the graph into an (n-1)-dimensional space? If this can be made to work then “close” actually will mean close, say in Euclidean distance.

The first possible way to do this is to build the adjacency matrix of the graph. For a graph with n nodes, this just means forming an n by n matrix with one row and column corresponding to each node. The matrix entries are zero, except that when node i and node j are connected, then entries ij and ji are set to the edge weight of the connection. We can actually dispense with the last column, since its contents can be inferred from the other columns.

Now the rows of any matrix can be interpreted geometrically by treating the entries as coordinates in a space spanned by the columns. So we can take our n by n-1 adjacency matrix and create an (n-1)-dimensional space with a point corresponding to each row of the matrix, and so to each node of the graph.

This geometry for the graph is still a bit awkward, because an (n-1)-dimensional space is still an ugly place to be (our intuitions don’t work very well in high-dimensional spaces). But if all you want to do is find the most important few nodes, then this is now easy.

Just for clarity, imagine that each entry of each row of the matrix has been divided by the sum of that row, so the entries are all less than 1. Now the embedded representation of the graph lives inside the (n-1)-dimensional (hyper)cube with one corner at the origin and the other corner (1,1,1,…..,1). The nodes with lots of high-weight edges to lots of other nodes are represented by points that are quite far from the origin, because the entries in their rows are comparatively large, and there are comparatively many non-zero entries. The nodes that are poorly connected to the rest of the graph are represented by points quite close to the origin because their rows have values that are mostly zeroes, and otherwise small. The most important nodes, then, will be on the “outer” edge of the geometric representation.

Now if we can find the direction out from the origin in which the “cloud” representing the points of the graph sticks out the most, then the extremal points in this direction will correspond to the most important nodes. Note that this direction isn’t necessarily towards the most extremal point (although we would usually expect it to be generally in that direction).

The direction in which the cloud bulges out the most from the origin is given by the principal eigenvector of the matrix representing the graph. Although this sounds complicated, it’s actually straightforward to compute. What’s more, the position of every node can be projected onto this vector and interpreted as a relative measure of the importance of that node. This is the basis for Google’s PageRank algorithm which treats the web as a giant graph, and does (more or less) this calculation to give every page an importance score. Note that this solves the problem of having to work in n-1 dimensions by projecting the entire graph structure onto a 1-dimensional space.

But there’s something badly wrong with this geometry. Suppose node i is connected to node j with weight 3, so in the geometry the point that corresponds to it is 5 units from the origin in dimension j. Now suppose that the weight on this edge is increased to 10. The point corresponding to node i suddenly moves further from the origin in dimension 3, and so further from all of the other points it is connected to, including point j. But an increase in weight is supposed to mean that i and j are closer! The problem is that, in this simple geometric embedding, the highly connected nodes are all far from the origin, on the outside of the graph structure, which tends to force them to be geometrically far from one another rather than close to one another.

The calculation of a single global importance works because large edge weights create large distances in the geometry and so tend to push nodes with edges like that far out. But in doing so messes up any other geometry among the nodes.

The solution is to use a different geometric embedding that turns the graph inside out so the the points that represent well-connected nodes are positioned close to the origin, and the sparsely connected nodes are positioned on the outer edges. This is done by converting the adjacency matrix, with its rows normalized by dividing by the rows sums, into a walk Laplacian matrix. This is done by negating all of the entries of the normalized adjacency matrix and adding 1’s down the diagonal.

The global variation among the nodes of the graph is now discovered by computing an eigendecomposition of this matrix (again straightforward) and looking at the eigenvector corresponding to the smallest non-zero eigenvalue. The intuition in this embedding is that this eigenvector points along the direction of minimal vibration in the embedded graph structure. Think of a cigar-shaped graph where the nodes are joined by metal rods. Striking it parallel to its long axis will produce much smaller vibrations than striking it parallel to any of the other axes. (In fact, the values of the eigenvector can be interpreted as the amplitude of the associated vibration of each node and the eigenvalue as the frequency of the vibration).

This approach will produce the same answer for the global importance ranking of the nodes, and so will find the same most important nodes. But the geometry is much better behaved. The distances between the embedded points behave as we would want them to (short edges = heavy weights) so clustering algorithms can be applied directly in the embedded space. The other eigenvectors also properly capture the substructure of the graph — for example, the eigenvector corresponding to the second-smallest non-zero eigenvalue describes the next most significant uncorrelated variation in the graph (in fact, the vibrational mode if the graph is struck parallel to the second axis of variation).

A geometric embedding is much more convenient if the graph is being updated. One problem with computing properties of a graph directly is that adding a single node can completely change all of the paths and all of the distances — so a social network measure has to be recomputed from scratch. In a geometric space, adding a new point doesn’t change the existing distances at all. An approximation to the position of the new node can be computed by calculating an adjacency matrix row for it, converting to a walk Laplacian row, and then using the existing eigendecomposition to embed it. If the new point really does change the structure of the graph, then this approximation is not useful; but most of the time it probably does not.

In fact, this embedding has a few other features that are not so obvious. The geometric embedding takes into account not just the (weighted) distance between nodes but also the possible presence of multiple paths between nodes. In other words, two nodes connected by lightweight paths, but many of them, will appear more similar than if only the weights had been taken into account. This is usually the right way to think about global relationships between unconnected nodes. In other words, this geometric embedding takes into account not only distance, but also density and curvature in the original graph.

This approach to finding importance is called spectral graph theory, and there is a large literature on it. But it is easy to lose sight of the wood for the trees. This short blog is intended to provide a more intuitive overview of the main point of using spectral graph techniques. In particular, if you already use PCA, it’s often possible to compute correlations of records to create a graph structure, and then analyse the structure using the approach described here. Almost always, clearer structures can be seen.

### Analyzing graph/relational data

One of the current puzzles is why knowledge discovery techniques for graph data do not perform as well, in practice, as they should in theory. The Netflix prize competition, which asks teams to predict user ratings of new movies based on several years of data about previous ratings, has turned out to be surprisingly difficult.

Ronald Coifman’s invited talk at the SIAM Data Mining Conference had something to add to this approach. He showed that the spectral approach to graph analysis, which works with eigenvectors of some matrix derived from the adjacency matrix of the graph, is really the same underneath as a wavelet approach, in which the structure in the graph is analyzed at varying scales. He has applied these ideas to graphs in which the edge affinities are derived from the thresholded pairwise affinities of data records, which makes it straightforward to turn attributed data into graph data without having to commit to a particular set of attributes in advance. This makes the approach easy to apply to data such as images and audio where there are a very large number of attributes.

The abstract of the talk is here, and slides may eventually be posted on this site as well.

Maggioni’s web site is a good place to read more.