Posts Tagged 'intelligence analysis'

What causes extremist violence?

This question has been the subject of active research for more than four decades. There have been many answers that don’t stand up to empirical scrutiny — because the number of those who participate in extremist violence is so small, and because researchers tend to interview them, but fail to interview all those identical to them who didn’t commit violence.

Here’s a list of the properties that we now know don’t lead to extremist violence:

  • ideology or religion
  • deprivation or unhappiness
  • political/social alienation
  • discrimination
  • moral outrage
  • activism or illegal non-violent political action
  • attitudes/belief

How do we know this? Mostly because, if you take a population that exhibits any of these properties (typically many hundreds of thousand) you find that one or two have committed violence, but the others haven’t. So properties such as these have absolutely no predictive power.

On the other hand, there are a few properties that do lead to extremist violence:

  • being the child of immigrants
  • having access to a local charismatic figure
  • travelling to a location where one’s internal narrative is reinforced
  • participation in a small group echo chamber with those who have similar patterns of thought
  • having a disconnected-disordered or hypercaring-compelled personality

These don’t form a diagnostic set, because there are still many people who have one or more of them, and do not commit violence. But they are a set of danger signals, and the more of them an individual has, the more attention should be paid to them (on the evidence of the past 15 years).

You can find a full discussion of these issues, and the evidence behind them, in ““Terrorists, Radicals, and Activists: Distinguishing Between Countering Violent Extremism and Preventing Extremist Violence, and Why It Matters” in Violent Extremism and Terrorism, Queen’s University Press, 2019.

 

Advertisements

‘AI’ performance not what it seems

As I’ve written about before, ‘AI’ tends to be misused to refer to almost any kind of data analytics or derived tool — but let’s, for the time being, go along with this definition.

When you look at the performance of these tools and systems, it’s often quite poor, but I claim we’re getting fooled by our own cognitive biases into thinking that it’s much better than it is.

Here are some examples:

  • Netflix’s recommendations for any individual user seem to overlap 90% with the ‘What’s trending’ and ‘What’s new’ categories. In other words, Netflix is recommending to you more or less what it’s recommending to everyone else. Other recommendation systems don’t do much better (see my earlier post on ‘The Sound of Music Problem’ for part of the explanation).
  • Google search results are quite good at returning, in the first few links, something relevant to the search query, but we don’t ever get to see what was missed and might have been much more relevant.
  • Google News produces what, at first glance, appear to be quite reasonable summaries of recent relevant news, but when you use it for a while you start to see how shallow its selection algorithm is — putting stale stories front and centre, and occasionally producing real howlers, weird stories from some tiny venue treated as if they were breaking and critical news.
  • Self driving cars that perform well, but fail completely when they see certain patches on the road surface. Similarly, facial recognition systems that fail when the human is wearing a t-shirt with a particular patch.

The commonality between these examples, and many others, is that the assessment from use is, necessarily, one-sided — we get to see only the successes and not the failures. In other words (HT Donald Rumsfeld), we don’t see the unknown unknowns. As a result, we don’t really know how well these ‘AI’ systems really do, and whether it’s actually safe to deploy them.

Some systems are ‘best efforts’ (Google News) and that’s fair enough.

But many of these systems are beginning to be used in consequential ways and, for that, real testing and real public test results are needed. And not just true positives, but false positives and false negatives as well. There are two main flashpoints where this matters: (1) systems that are starting to do away with the human in the loop (self driving cars, 737 Maxs); and (2) systems where humans are likely to say or think ‘The computer (or worse, the AI) can’t be wrong’; and these are starting to include policing and security tools. Consider, for example, China’s social credit system. The fact that it gives low scores to some identified ‘trouble makers’ does not imply that everyone who gets a low score is a trouble  maker — but this false implication lies behind this and almost all discussion of ‘AI’ systems.

Annular similarity

When similarity is used for clustering, then obviously the most similar objects need to be placed in the same cluster.

But when similarity is being used for human consumption, a different dynamic is in play — humans usually already know what the most similar objects are, and are interested in those that are (just) beyond those.

This can be seen most clearly in recommender systems. Purchase an item or watch a Netlflix show, and your recommendation list will fill up with new objects that are very similar to the thing you just bought/watched.

From a strictly algorithm point of view, this is a success — the algorithm found objects similar to the starting object. But from a human point of view this is a total fail because it’s very likely that you, the human, already know about all of these recommended objects. If you bought something, you probably compared the thing you bought with many or all of the objects that are now being recommended to you. If you watched something, the recommendations are still likely to be things you already knew about.

The misconception about what similarity needs to mean to be useful to humans is at the heart of the failure of recommender systems, and even the ad serving systems that many of the online businesses make their money from. Everyone has had the experience of buying something, only to have their ad feed (should they still see it) fill up with ads for similar products (“I see you just bought a new car — here are some other new cars you might like”).

What’s needed is annular similarity — a region that is centred at the initial object, but excludes new objects that are too similar, and focuses instead on objects that are a bit similar.

Amazon tries to do this via “People who bought this also bought” which can show useful add-on products. (They also use “People who viewed this also viewed” but this is much less effective because motivations are so variable.) But this mechanism also fails because buying things together doesn’t necessarily mean that they belong together — it’s common to see recommendations based on the fact that two objects were on special on the same day, and so more likely to be bought together because of the opportunity, rather than any commonality.

Annular similarity is also important in applications that help humans to learn new things: web search, online courses, intelligence analysis. That’s why we built the ATHENS divergent web search engine (refs below) — give it some search terms and it returns (clusters of) web pages that contain information that is just over the horizon from the search terms. We found that this required two annuli — we first constructed the information implicit in the search terms, then an annulus around that of information that we assumed would be known to someone who knew the core derived from the search terms, and only then did we generate another annulus which contains the results returned.

We don’t know many algorithmic ways to find annular similarity. In any distance-based clustering it’s possible, of course, to define an annulus around any point. But it’s tricky to decide on what the inner and outer radii should be, the calculations have to happen in high-dimensional space where the points are very sparse, and it’s not usually clear whether the space is isotropic.

Annular similarity doesn’t work (at least straightforwardly) in density-based (e.g. DBScan) or distribution-based clustering (e.g. EM) because the semantics of ‘cluster’ doesn’t allow for an annulus.

One way that does work (and was used extensively in the ATHENS system) is based on singular vallue decomposition (SVD). An SVD projects a high-dimensional space into a low-dimensional one in such a way as to preserve as much of the variation as possible. One of its useful side-effects is that a point that is similar to many other points tends to be projected close to the origin; and a point that is dissimilar to most other points also tends to be projected close to the origin because the dimension(s) it inhabits have little variation and tend to be projected away. In the resulting low-dimensional projection, points far from the origin tend to be interestingly dissimilar to those at the centre of the structure — and so an annulus imposed on the embedding tends to find an interesting set of objects.

Unfortunately this doesn’t solve the recommender system problem because recommenders need to find similar points that have more non-zeroes than the initial target point — and the projection doesn’t preserve this ordering well. That means that the entire region around the target point has to be searched, which becomes expensive.

There’s an opportunity here to come up with better algorithms to find annular structures. Success would lead to advances in several diverse areas.

(A related problem is the Sound of Music problem, the tendency for a common/popular object to muddle the similarity structure of all of the other objects because of its weak similarity to all of them. The Sound of Music plays this role in movie recommendation systems, but think of wrapping paper as a similar object in the context of Amazon. I’ve written about this in a previous post.)

 

Tracy A. Jenkin, Yolande E. Chan, David B. Skillicorn, Keith W. Rogers:
Individual Exploration, Sensemaking, and Innovation: A Design for the Discovery of Novel Information. Decision Sciences 44(6): 1021-1057 (2013)
Tracy A. Jenkin, David B. Skillicorn, Yolande E. Chan:
Novel Idea Generation, Collaborative Filtering, and Group Innovation Processes. ICIS 2011
David B. Skillicorn, Nikhil Vats:
Novel information discovery for intelligence and counterterrorism. Decision Support Systems 43(4): 1375-1382 (2007)
Nikhil Vats, David B. Skillicorn:
Information discovery within organizations using the Athens system. CASCON 2004: 282-292

 

Islamist violent extremism and anarchist violent extremism

Roughly speaking, three explanations for islamist violent extremism have been put forward:

  1. It’s motivated by a religious ideology (perhaps a perversion of true Islam, but sincerely held by its adherents);
  2. It’s motivated by political or insurgent ends, and so the violence is instrumental;
  3. It’s the result of psychological disturbance in its adherents.

In the months after the 9/11 World Trade Center attacks, Marc Sageman argued vigorously for the first explanation, pointing out that those involved in al Qaeda at the time were well-educated and at least middle class, were religious, and showed no signs of psychological disturbances. There was considerable push back to his arguments, mostly promoting Explanation 3 but, in the end, most Western governments came around to his view.

In the decade since, most Western countries have slipped into Explanation 2. I have argued that this is largely because these countries are post-Christian, and so most of those in the political establishment have post-modern ideas about religion as a facade for power. They project this world view onto the Middle Eastern world, and so cannot see that Explanation 1 is even possible — to be religious is to be naive at best and stupid at worst. This leads to perennial underestimation of islamist violent extremist goals and willingness to work towards them.

It’s widely agreed that the motivation for Daesh is a combination of Explanations 1 and 2, strategically Explanation 1, but tactically Explanation 2.

The new feature, however, is that Daesh’s high-volume propaganda is reaching many psychologically troubled individuals in Western countries who find its message to be an organising principle and a pseudo-community.

“Lone wolf” attacks can therefore be divided into two categories: those motivated by Explanation 1, and those motivated by Explanation 3, and the latter are on the rise. Marc Sageman has written about the extent to which foiled “plots” in the U.S. come very close to entrapment of vulnerable individuals who imagine that they would like to be terrorists, and take some tiny initial step, only to find an FBI agent alongside them, urging them to take it further. (M. Sageman, The Stagnation in Terrorism Research, Terrorism and Political Violence, Vol. 26, No. 4, 2014, 565-580)

Understanding these explanations is critical to efforts at de-radicalization. Despite extensive efforts, I have seen very little evidence that de-radicalization actually works. But it make a difference what you think you’re de-radicalizing from. Addressing Explanation 1 seems to be the most common strategy (“your view of Islam is wrong, see the views of respected mainstream Imams, jihad means personal struggle”).

Addressing Explanation 2 isn’t usually framed as de-radicalization but, if the violence is instrumental, then instrumental arguments would help (“it will never work, the consequences are too severe to be worth it”).

Addressing Explanation 3 is something we know how to do, but this explanation isn’t the popular one at present, and there are many pragmatic issues about getting psychological help to people who don’t acknowledge that they need it.

Reading the analysis of anarchist violence in the period from about 1880 to around 1920 has eerie similarities to the analysis of islamist violence in the past 15 years, both in the popular press, and in the more serious literature. It’s clear that there were some (but only a very few) who were in love with anarchist ideology (Explanation 1); many more who saw it as a way (the only way) to change society for the better (Explanation 2) — one of the popular explanations for the fading away of anarchist attacks is that other organisations supporting change developed; but there were also large numbers of troubled individuals who attached themselves to anarchist violence for psychological reasons. It’s largely forgotten how common anarchist attacks became during these few decades. Many were extremely successful — assassinations of a French president, an American president, an Austrian Empress, an Italian king — and, of course, the Great War was inadvertently triggered by an assassination of an Archduke.

Western societies had little more success stemming anarchist violence than we are having with islamist violence. The Great War probably had as much effect as anything, wiping out the demographic most associated with the problem. We will have to come up with a better solution.

(There’s a nice recap of anarchist violence and its connections to islamist violence here.)

Inspire and Azan paper is out

The paper Edna Reid and I wrote about the language patterns in Inspire and Azan magazines has now appeared (at least online) in Springer’s Security Informatics journal. Here’s the citation:

“Language Use in the Jihadist Magazines Inspire and Azan”
David B Skillicorn and Edna F Reid
Springer Security Informatics.2014, 3:9
Security Informatics

The paper examines the intensity of various kinds of language in these jihadist magazines. The main conclusions are:

  • These magazines use language as academic models of propaganda would predict, something that has not been empirically verified at this scale AFAIK.
  • The intellectual level of these magazines is comparable to other mass market magazines — they aren’t particularly simplistic, and they assume a reasonably well-educated readership.
  • The change in editorship/authorship after the deaths of Al-Awlaki and Samir Khan are clearly visible in Inspire. The new authors have changed for each issue, but there is an overarching similarity. Azan has articles claiming many different authors, but the writing style is similar across all articles and issues; so it’s either written by a single person or by a tightly knit group.
  • Jihadist language intensity has been steadily increasing over the past few issues of Inspire, after being much more stable during the Al-Awlaki years (this is worrying).
  • Inspire is experimenting with using gamification strategies to increase motivation for lone-wolf attacks and/or to decrease the reality of causing deaths and casualties. It’s hard to judge whether this is being done deliberately, or by osmosis — the levels of gamification language waver from issue to issue.

ISIS is putting out its own magazine. Its name, “Islamic State News”, and the fact that it is entirely pictorial (comic or graphic novel depending on your point of view) says something about their view of the target audience.

Pull from data versus push to analyst

One of the most striking things about the discussion of the NSA data collection that Snowden has made more widely known is the extent to which the paradigm for its use is database oriented. Both the media and, more surprisingly, the senior administrators talk only about using the data as a repository: “if we find a cell phone in Afghanistan we can look to see which numbers in the US it has been calling and who those numbers in turn call” has been the canonical justification. In other words, the model is: collect the data and then have analysts query it as needed.

The essence of data mining/knowledge discovery is exactly the opposite: allow the data to actively and inductively generate models with an associated quality score, and use analysts to determine which of these models is truly plausible and then useful. In other words, rather than having analysts create models in their heads and then use queries to see if they are plausible (a “pull” model), algorithmics generates models inductively and presents them to analysts (a “push” model). Since getting analysts to creatively think of reasonable models is difficult (and suffers from the “failure of imagination” problem, the inductive approach is both cheaper and more effective.

For example, given the collection of metadata about which phone numbers call which others, it’s possible to build systems that produce results of the form: here’s a set of phone numbers whose calling patterns are unlike any others (in the whole 500 million node graph of phones). Such a calling pattern might not represent something bad, but it’s usually worth a look. The phone companies themselves do some of this kind of analysis, for example to detect phones that are really business lines but are claiming to be residential and, in the days when long distance was expensive, to detect the same scammers moving across different phone numbers.

I would hope that inductive model building is being used on collected data, and the higher-ups in the NSA either don’t really understand or are being cagey. But I’ve talked to a lot of people in government who collect large data but are completely stuck in the database model, and have no inkling of inductive modelling.


Advertisements