Connecting the dots is hard

In the aftermath of the Christmas Day attempted in-flight bombing, the issue of whether intelligence agencies should have been able to “connect the dots” beforehand has once again been heavily discussed.

Putting together disparate pieces of information to discover a pattern of interest is much more difficult than it looks, but the reason is subtle.

First, and obviously, it’s much easier to find some pieces of information that fit into a pattern when you already know the pattern — so connecting the dots after an incident always looks easy. This is just another way of saying the hindsight is 20/20.

What about the problem of putting together the pieces before an incident? Let’s suppose, for simplicity, that the pieces of information are simple factoids: this person did something, said something, bought something that might have been potentially suspicious/interesting. We could give each of these factoids a weight indicating how important we suppose it to be, perhaps based on its inherent unusualness or connection to a perceived risk; and perhaps also based on the reliability associated with it.

Even this simple first step is not straightforward because the weight depends to some extent on some perception of perceived modes of attack: buying the chemical components of an explosive seems like it should be high weight; but many other actions have ambiguous possible weights depending on what we (implicitly beforehand) think is plausible or likely. Buying Jello might indicate an interest in growing bacteria, or just a taste for a cheap dessert.

And if assigning weights to individual factoids is difficult, the difficulty is compounded by the sheer number of such factoids that exist. I don’t have any hard information, but from public statements we could estimate that perhaps 10,000 potential terrorists are being tracked around the world; on any given day, the number of factoids generated by their actions, communications, and web traffic could easily be hundreds of time greater.

So, it’s no surprise that individual factoids get underweighted when they first enter intelligence systems. The net effect of the failure to detect the recent attack is that all factoids will be given more weight, which relatively has no effect at all (except to keep already overstretched intelligence officers busier).

But this is the easy part. The connecting of these factoid dots is much, much harder.

First, the existence of a connection between two factoids can change (perhaps dramatically) the weight associated with both of them. So, theoretically at least, the potential association between each pair of factoids should be explored. In complexity terms, the number of comparisons is quadratic in the number of factoids: if there are 100 factoids, the  there are 100×100 possible connections. Calculations that have complexity quadratic in the size of their inputs are just on the boundary of the practically doable — possible for small numbers of inputs, but taking too long for larger numbers of inputs. For 1,000,000 factoids per day, the number of pairwise connections to check is 1,000,000,000,000,  just doable on special-purpose hardware at a central site. In practice, I suspect that only a smallish subset of these connections are actually considered in real-time, so there is now the possibility of failing to connect two dots just because there are a lot of dots and therefore a lot more possible connections.

It gets worse. When a new connection changes the weights of the factoids that it connects, these changed weights now affect other factoids to which they are connected; and these in turn propagate a changed weight to the factoids to which they are connected; and so on. In other words, discovering a new connection between two factoids can alter the perceived weight of many, even all, of the other factoids. This means, among other things, that it’s hard to work with just part of the graph, because a change made somewhere else can radically change the meaning of the part.

It gets even worse. The connections themselves, and aggregates of the connections, can become meta-factoids. For example, the fact that person A communicates with person B via phone but person B communicates with person A by email is potentially a useful factoid, revealing something about the communication infrastructure each has access to, their attitudes to it (perhaps their perception about security risks of different technologies), and even that they are trying to communicate covertly (since most people communicate symmetrically). The connections between factoids create a web or graph whose structure at many different levels can reveal relationships among the factoids that change their individual significance.

The bottom line is that it’s extremely hard to find sets of dots with interesting connections. Part of this is the sheer complexity of the data structure and the algorithms that would be required. But we actually don’t know much about the useful algorithms. These webs or graphs have many emergent properties. If we understood them, there are surely ways to focus attention on only parts of the data that have the greatest potential to lead to interesting factoids and connections. Network science is emerging as a new area of research where exactly these kinds of questions are being explored, but it is in its infancy, and we know only the most rudimentary properties of such structures: the common appearance of power laws, preferential attachment as a construction mechanism, some measures of importance of nodes within a graph, and so on. But the big theories remain elusive.

0 Responses to “Connecting the dots is hard”

  1. Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: