Posts Tagged 'connecting the dots'

Islamist violent extremism and anarchist violent extremism

Roughly speaking, three explanations for islamist violent extremism have been put forward:

  1. It’s motivated by a religious ideology (perhaps a perversion of true Islam, but sincerely held by its adherents);
  2. It’s motivated by political or insurgent ends, and so the violence is instrumental;
  3. It’s the result of psychological disturbance in its adherents.

In the months after the 9/11 World Trade Center attacks, Marc Sageman argued vigorously for the first explanation, pointing out that those involved in al Qaeda at the time were well-educated and at least middle class, were religious, and showed no signs of psychological disturbances. There was considerable push back to his arguments, mostly promoting Explanation 3 but, in the end, most Western governments came around to his view.

In the decade since, most Western countries have slipped into Explanation 2. I have argued that this is largely because these countries are post-Christian, and so most of those in the political establishment have post-modern ideas about religion as a facade for power. They project this world view onto the Middle Eastern world, and so cannot see that Explanation 1 is even possible — to be religious is to be naive at best and stupid at worst. This leads to perennial underestimation of islamist violent extremist goals and willingness to work towards them.

It’s widely agreed that the motivation for Daesh is a combination of Explanations 1 and 2, strategically Explanation 1, but tactically Explanation 2.

The new feature, however, is that Daesh’s high-volume propaganda is reaching many psychologically troubled individuals in Western countries who find its message to be an organising principle and a pseudo-community.

“Lone wolf” attacks can therefore be divided into two categories: those motivated by Explanation 1, and those motivated by Explanation 3, and the latter are on the rise. Marc Sageman has written about the extent to which foiled “plots” in the U.S. come very close to entrapment of vulnerable individuals who imagine that they would like to be terrorists, and take some tiny initial step, only to find an FBI agent alongside them, urging them to take it further. (M. Sageman, The Stagnation in Terrorism Research, Terrorism and Political Violence, Vol. 26, No. 4, 2014, 565-580)

Understanding these explanations is critical to efforts at de-radicalization. Despite extensive efforts, I have seen very little evidence that de-radicalization actually works. But it make a difference what you think you’re de-radicalizing from. Addressing Explanation 1 seems to be the most common strategy (“your view of Islam is wrong, see the views of respected mainstream Imams, jihad means personal struggle”).

Addressing Explanation 2 isn’t usually framed as de-radicalization but, if the violence is instrumental, then instrumental arguments would help (“it will never work, the consequences are too severe to be worth it”).

Addressing Explanation 3 is something we know how to do, but this explanation isn’t the popular one at present, and there are many pragmatic issues about getting psychological help to people who don’t acknowledge that they need it.

Reading the analysis of anarchist violence in the period from about 1880 to around 1920 has eerie similarities to the analysis of islamist violence in the past 15 years, both in the popular press, and in the more serious literature. It’s clear that there were some (but only a very few) who were in love with anarchist ideology (Explanation 1); many more who saw it as a way (the only way) to change society for the better (Explanation 2) — one of the popular explanations for the fading away of anarchist attacks is that other organisations supporting change developed; but there were also large numbers of troubled individuals who attached themselves to anarchist violence for psychological reasons. It’s largely forgotten how common anarchist attacks became during these few decades. Many were extremely successful — assassinations of a French president, an American president, an Austrian Empress, an Italian king — and, of course, the Great War was inadvertently triggered by an assassination of an Archduke.

Western societies had little more success stemming anarchist violence than we are having with islamist violence. The Great War probably had as much effect as anything, wiping out the demographic most associated with the problem. We will have to come up with a better solution.

(There’s a nice recap of anarchist violence and its connections to islamist violence here.)

Radicalisation as infection

I’ve argued in previous posts that the process of radicalisation is one that depends largely on properties of the individual, rather than on grand social or moral drivers — personality rather than society — and that it depends on the presence of an actual person (already radicalised) who makes the potential ideas real.

There is an alternative. Woo, Son, and Chen (J. Woo, J. Son, and H. Chen. An SIR model for violent topic diffusion in social media. In Proceedings of 2011 IEEE International Conference on Intelligence and Security Informatics, ISI 2011, July 2011) show that radicalisation behaves a little bit like an infection (at least in the domain of ideas which they measure from forum postings). They show that the SIR (Susceptible-Infected-Recovered) model of disease transmission fits the data fairly well. In this model, members of a population begin in the susceptible state; they become infected with some probability A, and then recover with some probability B. After they’ve recovered they are no longer susceptible.

For the data they looked at, A was of the magnitude of 10^-4, so about 1 in 10,000 becomes infected. Once infected, B varied depending on the intensity of the topic from around 0.65 to 0.96. In other words, the probability of a ‘cure’ is well above a half, sometimes virtually certain.

This model suggests some interesting probabilities. First, it suggest that radicalisation is a state that can cure itself; in other words, we shouldn’t necessarily assume that once radicalised means always radicalised. Second, there may be a greater pool of people who pass through the stage of being radicalised but do not get it together to actually act on it before the fever breaks — perhaps because they don’t get the right training or the right opportunity at the time when they would exploit it if they could.

The numbers work out about right. There are around a million Muslims in the U.S. but the number who have (attempted to) carry out attacks is in the small number of dozens.

Questions are data too

In the followup investigation of the Boston Marathon bombings, we see again the problem that data analytics has with questions.

Databases are built to store data. But, as Jeff Jones has most vocally pointed out, simply keeping the data is not enough in adversarial settings. You also need to keep the questions, and treat them as part of the ongoing data. The reason is obvious once you think about it — intelligence analysts need not only to know the known facts; they also need to know that someone else has asked the same question they just asked. Questions are part of the mental model of analysts, part of their situational awareness, but current systems don’t capture this part and preserve it so that others can build on it. In other words, we don’t just need to connect the dots; we need to connect the edges!

Another part of this is that, once questions are kept, they can be re-asked automatically. This is immensely powerful. At present, an analyst can pose a question (“has X ever communicated with Y?”), get a negative answer, only for information about such a communication to arrive a microsecond later and not be noticed. In fast changing environments, this can happen frequently, but it’s implausible to expect analysts to remember and re-pose their questions at intervals, just in case.

We still have some way to go with the tools and techniques available for intelligence analysis.

The power is in the edges

I’ve argued that it isn’t social media unless there are relational edges between individuals or individual objects. These edges are the drivers of power because the graph structure that emerges from them reveals a lot more than the individual nodes and edges do.

The number of LinkedIn contacts I have is now large enough that I can tell this story. I know someone from one of the more secretive US government organizations. His (or it might be her) public web presence, of course, has nothing at all to do with his day job, and we’ve never exchanged emails using his public email. Yet LinkedIn suggests him as someone I might possibly know.

The reason must be that we have enough mutual connections that the software LinkedIn uses sees that there “should” be some connection between us — it is doing edge prediction. This is exactly the kind of analysis that good intelligence tools can do on relational/graph data. The knowledge is in the links, collectively; in other words, noticing the potential link between us requires knowing both the presence of some links and the absence of others (because the system doesn’t recommend other people whose web presence is as dissimilar from mine as his is).

So, well done LinkedIn, but a cautionary tale for security folks generally, and especially those who believe in anonymization — it can’t be done!

“Info Systems Must ‘Connect Dots’ on Terrorism” by Mannes and Hendler

There’s a new article in Defense News making some of the same points I’ve been making here: that accumulating data is not enough; what’s missing is tools to process the data and point out to human analysts what and where the important stuff is.

Mannes and Hendler make two good points: that it’s not about more data; and it’s not about being able to fuse data from different sources, although these are both good things. The problem is that there is still too much data to find out anything interesting, especially in a timely way.

I do believe that it’s important and useful to be able to automatically deduce what kind of object a given string describes, and so hypothesise what its attributes might be — so that people names come with locations and relatives. I’m not convinced that this has much to do with the Semantic Web (and I’m fairly sure you can generate a knock down fight by asking any two Semantic Web researchers to give a precise definition of what it is).

It’s sad, in a way, that the technology that is used as the ubiquitous metaphor for post-9/11 knowledge discovery (connecting the dots) is still the piece that isn’t being done! (I made the same point about the Australian government’s white paper which has a gaping hole about exactly this issue.)

Thoughts on the Australian Government White Paper on Counter-terrorism

The Australian government has just released a White Paper updating their policy on counterterrorism. Most of the content is eminently sensible, but there are a couple of questionable assumptions and/or directions.

1.  The section on resilience assumes that radicalisation can be mitigated by “reducing disadvantage” using government actions to address social and economic issues. This may well be so, but I don’t think there’s much evidence to support it. It’s clear that there are countries where economic and social grievances are significant drivers for radicalisation (e.g. Southern Thailand); but the results of a recent survey in Canada with which I was involved showed clearly that attitudes about economic and social issues were uncorrelated to radicalism. Although many Islamic immigrants to Canada (and indeed many immigrants) struggle with e.g. access to jobs, this does not seem to turn into a sense of grievance that might lead to radicalisation. Australia may be different, but there doesn’t seem to be any particular reason why it should be.

2. The section on intelligence-led counterterrorism talks about three components: the ability to collect; the ability to analyse; and the ability to share. There is existing capacity and proposed actions for the first and the the third — but there is a great black hole in both existing capacity and proposed action for the second: analysis.

It’s easy to skip over this word and assume what it means; but I suspect that, when it’s unpacked, it tends to be taken to mean either “looking stuff up” or “having a human put stuff together to discover its significance”. It doesn’t take much thought to realise that this can’t be enough. The challenge in intelligence is (a) deciding how important each dot is, and (b) finding the interesting constellations of dots from among the many possible constellations. In practice, the number of dots is in the thousands (and up) each day, so this process must be largely automated.

There is a strange blind spot about the role and importance of analysis. I suspect that this is mostly because it’s not obvious how powerful inductive data modelling can be and it’s not on the conceptual map of most people, especially those whose training has been in the humanities and social sciences. But talking about collection and sharing without talking about analysis is like a sandwich without the filling — and you don’t make a better sandwich by improving the quality of the bread, if there’s still no filling.

Analysis is tough for intelligence agencies, who are fighting a battle to upgrade their capabilities at the same time as meeting the real-time challenges of what analysis they can already do. And, although data mining/knowledge discovery is a well-developed subject, adversarial data mining, which I’ve often argued here is quite a different subject, has received little attention. One way that governments can help is to let some of this upgrading happen in universities. As far as I am aware, there is almost no work on counterterrorism analysis happening in Australian universities, and the possibility gets only a tiny mention in the National Security Science and Innovation Strategy. There are several research groups looking at the social aspects of terrorism and counterterrorism, and one or two looking at the forensic aspects of data analysis, but a conspicuous absence of work on data analysis as a preventive and preemptive tool.

A part of the report that has attracted media attention is the intent to impose special visa requirements for applicants from 10 as-yet-unidentified countries (but the US imposed special requirements on 10 countries so it probably isn’t too hard to guess the list). Two parts of this are problematic. First, it will use new biometrics — although this seems to be a grand way of talking about fingerprints and facial photos. Biometrics get over-trusted; they are mostly relatively easy to spoof. Second, the report promises to use “advanced data analysis and risk profiling” to identify risky visa applicants. It’s hard to know what to make of this,  but it sounds like either something quite weak, or something with unworkably high false-positive and false-negative rates.

3.  The problem with treating home-grown terrorism as a law enforcement problem is that catching and sentencing those who have planned or carried out attacks doesn’t do anything for those who are “next in line”. There’s a risk that dealing with a home-grown group simply radicalises their supporters to the point of violence. For example, this seems to be a potential risk after the sentencing of five men last week.

Other countries, for example Thailand and Saudi Arabia (although with questionable success), take a wider view and try to deradicalise those whose involvement with terrorist activity is marginal. In other words, any criminal events in the terrorism area are regarded as the tip of an iceberg; and other approaches (sometimes called “smart power”) are used to address the less-visible hinterland of the criminal event. While a law enforcement approach is good, there seems to be some scope for a wider approach to the problem. And the great majority of home-grown attacks have been discovered and prevented because of the actions of a whistle-blower within the attackers’ community, so motivating such whistle-blowing and making it easy seems like it should be a centrepiece of any proposed strategy.

A Gentle Introduction to Finding the Most Important Node in Graph-Connected Data

I’ve posted frequently about the problem that analysts face when they are dealing with a set of objects, and connections between pairs of objects. For example, the nodes could be people, and the edges some measure of how much each pair of people communicate; the nodes could be reports and the edges how much each pair of reports mention the same entities; the nodes could be computers and the edges how well connected they are to each other. In other words, the natural data structure to represent the set of knowledge is a graph.

Very often such graphs are large, with thousands of nodes, and thousands of edges. The problem the analyst has is to work out which nodes of the graph deserve the most attention; often most of the graph is boring, or at least it is already well understood.

In what follows, I’ll assume that nodes by themselves have no associated score, but there is some natural way to associate a score with the edges that connect them. Let’s see how such a graph could be built, and so think about how we would keep track of which bits of it might be most important.

If the graph consists of only a single node, there’s not much to say. So consider what happens when a second node is added, and let’s assume that the second node is connected to the first (by a weighted edge). If they aren’t connected, then we might treat it as two separate graphs and two separate analysis problems. The only way to represent the graph with two nodes is with the single edge between them, but it might be plausible to make the length of the edge inversely proportional to the weight of the edge — if the weight is large, there’s a strong connection between them and so the nodes might plausibly be close together. We start to insert some geometry into the graph relationship because geometry is reasonably intuitive to us humans.

Now if we add a third node, there are two possibilities, if we want to continue to make edge length inversely proportional to edge weights (strong = close). If the third node is connected to both of the current two nodes, then the resulting structure is a triangle (whose exact shape depends on the relative size of the edge weights). If the third node is connected to only one of the current nodes, then the shape is a path.

The worst case is that adding each new node requires us to expand into another dimension: 3 nodes needs 2 dimensions, 4 nodes needs 3 dimensions, and so on, if we want to preserve the relationship between edge length and edge weight.

In this representation, nodes that are important because they are strongly connected to other important nodes (note the circularity) tend to be pulled towards the “middle” of the graph structure. So if we want to find the most important nodes, we should look near the middle.

But we don’t have that much geometry so, although the idea of the middle is intuitive, finding the middle is rather difficult. One approach is widely used in Social Network Analysis — the middle can (sort of) be found by computing how far it is from each node to the outer surface of the graph, and then looking at those nodes that are farthest inside. This works, after a fashion, but the algorithms can be complex and they tend not to tell you much else about the graph structure.

What about trying to use geometry directly and embedding the graph into an (n-1)-dimensional space? If this can be made to work then “close” actually will mean close, say in Euclidean distance.

The first possible way to do this is to build the adjacency matrix of the graph. For a graph with n nodes, this just means forming an n by n matrix with one row and column corresponding to each node. The matrix entries are zero, except that when node i and node j are connected, then entries ij and ji are set to the edge weight of the connection. We can actually dispense with the last column, since its contents can be inferred from the other columns.

Now the rows of any matrix can be interpreted geometrically by treating the entries as coordinates in a space spanned by the columns. So we can take our n by n-1 adjacency matrix and create an (n-1)-dimensional space with a point corresponding to each row of the matrix, and so to each node of the graph.

This geometry for the graph is still a bit awkward, because an (n-1)-dimensional space is still an ugly place to be (our intuitions don’t work very well in high-dimensional spaces). But if all you want to do is find the most important few nodes, then this is now easy.

Just for clarity, imagine that each entry of each row of the matrix has been divided by the sum of that row, so the entries are all less than 1. Now the embedded representation of the graph lives inside the (n-1)-dimensional (hyper)cube with one corner at the origin and the other corner (1,1,1,…..,1). The nodes with lots of high-weight edges to lots of other nodes are represented by points that are quite far from the origin, because the entries in their rows are comparatively large, and there are comparatively many non-zero entries. The nodes that are poorly connected to the rest of the graph are represented by points quite close to the origin because their rows have values that are mostly zeroes, and otherwise small. The most important nodes, then, will be on the “outer” edge of the geometric representation.

Now if we can find the direction out from the origin in which the “cloud” representing the points of the graph sticks out the most, then the extremal points in this direction will correspond to the most important nodes. Note that this direction isn’t necessarily towards the most extremal point (although we would usually expect it to be generally in that direction).

The direction in which the cloud bulges out the most from the origin is given by the principal eigenvector of the matrix representing the graph. Although this sounds complicated, it’s actually straightforward to compute. What’s more, the position of every node can be projected onto this vector and interpreted as a relative measure of the importance of that node. This is the basis for Google’s PageRank algorithm which treats the web as a giant graph, and does (more or less) this calculation to give every page an importance score. Note that this solves the problem of having to work in n-1 dimensions by projecting the entire graph structure onto a 1-dimensional space.

But there’s something badly wrong with this geometry. Suppose node i is connected to node j with weight 3, so in the geometry the point that corresponds to it is 5 units from the origin in dimension j. Now suppose that the weight on this edge is increased to 10. The point corresponding to node i suddenly moves further from the origin in dimension 3, and so further from all of the other points it is connected to, including point j. But an increase in weight is supposed to mean that i and j are closer! The problem is that, in this simple geometric embedding, the highly connected nodes are all far from the origin, on the outside of the graph structure, which tends to force them to be geometrically far from one another rather than close to one another.

The calculation of a single global importance works because large edge weights create large distances in the geometry and so tend to push nodes with edges like that far out. But in doing so messes up any other geometry among the nodes.

The solution is to use a different geometric embedding that turns the graph inside out so the the points that represent well-connected nodes are positioned close to the origin, and the sparsely connected nodes are positioned on the outer edges. This is done by converting the adjacency matrix, with its rows normalized by dividing by the rows sums, into a walk Laplacian matrix. This is done by negating all of the entries of the normalized adjacency matrix and adding 1’s down the diagonal.

The global variation among the nodes of the graph is now discovered by computing an eigendecomposition of this matrix (again straightforward) and looking at the eigenvector corresponding to the smallest non-zero eigenvalue. The intuition in this embedding is that this eigenvector points along the direction of minimal vibration in the embedded graph structure. Think of a cigar-shaped graph where the nodes are joined by metal rods. Striking it parallel to its long axis will produce much smaller vibrations than striking it parallel to any of the other axes. (In fact, the values of the eigenvector can be interpreted as the amplitude of the associated vibration of each node and the eigenvalue as the frequency of the vibration).

This approach will produce the same answer for the global importance ranking of the nodes, and so will find the same most important nodes. But the geometry is much better behaved. The distances between the embedded points behave as we would want them to (short edges = heavy weights) so clustering algorithms can be applied directly in the embedded space. The other eigenvectors also properly capture the substructure of the graph — for example, the eigenvector corresponding to the second-smallest non-zero eigenvalue describes the next most significant uncorrelated variation in the graph (in fact, the vibrational mode if the graph is struck parallel to the second axis of variation).

A geometric embedding is much more convenient if the graph is being updated. One problem with computing properties of a graph directly is that adding a single node can completely change all of the paths and all of the distances — so a social network measure has to be recomputed from scratch. In a geometric space, adding a new point doesn’t change the existing distances at all. An approximation to the position of the new node can be computed by calculating an adjacency matrix row for it, converting to a walk Laplacian row, and then using the existing eigendecomposition to embed it. If the new point really does change the structure of the graph, then this approximation is not useful; but most of the time it probably does not.

In fact, this embedding has a few other features that are not so obvious. The geometric embedding takes into account not just the (weighted) distance between nodes but also the possible presence of multiple paths between nodes. In other words, two nodes connected by lightweight paths, but many of them, will appear more similar than if only the weights had been taken into account. This is usually the right way to think about global relationships between unconnected nodes. In other words, this geometric embedding takes into account not only distance, but also density and curvature in the original graph.

This approach to finding importance is called spectral graph theory, and there is a large literature on it. But it is easy to lose sight of the wood for the trees. This short blog is intended to provide a more intuitive overview of the main point of using spectral graph techniques. In particular, if you already use PCA, it’s often possible to compute correlations of records to create a graph structure, and then analyse the structure using the approach described here. Almost always, clearer structures can be seen.