Why Data Science?

Data Science has become a hot topic lately. As usual, there’s not a lot of agreement about what data science actually is. I was on a panel last week, and someone asked afterwards what the difference was between data mining, which we’ve been doing for 15 years, and data science.

It’s a good question. Data science is a new way of framing the scientific enterprise in which a priori hypothesis creation is replaced by inductive modelling; and this is exactly what data mining/knowledge discovery is about (as I’ve been telling my students for a decade).

What’s changed, perhaps, is that scientists in many different areas have realised the existence and potential of this approach, and are commandeering it for their own.

I’ve included the slides from a recent talk I gave on this subject (at the University of Technology Sydney).

And once again let me emphasise that the social sciences and humanities did not really have access to the Enlightenment model of doing science (because they couldn’t do controlled experiments), but they certainly do to the new model. So expect a huge development in data social science and data humanities as soon as research students with the required computational skills move into academia in quantity.

Why data science (ppt slides)

Radicalization — it’s just a phase he’s going through

All of the discussion of radicalization in the past few weeks seems to assume that it’s a one-way process.

But if it’s a process with a large personality component (and evidence suggests it is); and if it’s a phenomenon associated with adolescence and young adulthood (which are times of attitudinal change anyway); and if the data fits models of infection by disease (and they do), then it seems plausible that, for many people, radicalization is a phase they go through. Such people will not be obtrusive because they never act on their (temporary) beliefs, and eventually cease to hold them. If radicalization can be a temporary phenomenon, then there’s de-radicalization, but there’s also post-radicalization; the first extrinsic, but the second intrinsic.

What’s the practical relevance? If some people “get over” their radicalization, then it argues for more gentle responses during the infected period. Actions such as interviews by security services with radicalized individuals and their relatives (a practice of MI5, and soon to be possible in Canada via bill C-51), and pulling passports may indeed have negative consequences if they make infected individuals become more entrenched (and less likely to become cured).

Of course, there are risks to a more gentle intervention strategy (and government departments are allergic to risks). But, for countries with exit controls, perhaps it’s better to rely on these than to act more explicitly; and at least the discussions about strategy should keep the possibility of cure in mind.

Multiculturalism’s role in radicalization

The children of immigrants have, historically, had two choices:

  1. Assimilate into the culture, retaining vestiges of their original culture (typically foods, celebrations, and perhaps a bias towards marrying cultural cognates);
  2. Remain part of an enclave of their original culture.

Option 1 is by far the most common. Option 2 only works when the original culture is itself highly organized, and it carries high risks for the immigrants. This option has often been followed by the Jewish diaspora (with obvious downsides, including periodic expulsions from European countries, and worse), but there are other examples. Note the wisdom of the formal Amish mechanism of rumspringa, which provides a choice point for young people to commit to the culture, or not.

In historical immigrations, these choices are clearly differentiated and there is little midde ground.

The invention of the idea of multiculturalism created the opportunity to move to a new country, assimilate (apparently), and preserve the culture of origin (apparently). This sounds like a great idea (“best of both worlds”) apart from one simple fact: it doesn’t work.

The set of those who have been radicalized in Western countries and either carried out attacks there, or made their ways to the Middle East is almost entirely made up of the children of immigrants. Many of these individuals have been studied and interviewed, and there is one clear pattern: feeling like they didn’t belong in either their original culture (their parents often being glad to have escaped it at some level and so having moved away from it), nor in the “new” culture in which they have grown up. Not feeling like they fit into the culture in which they grow up is, of course, a common teenage pattern — but most teenagers don’t have such a ready-made explanation for why they feel as they do. Multiculturalism, because it creates the apparent space to avoid commitment to one culture or the other, must bear some of the responsibility for radicalization. (This may also be part of the explanation for why engineers are so over-represented in the ranks of the radicalized — a tolerance for ambiguity may help those growing up between two cultures to navigate the difficult years of adolescence and young adulthood. Most engineers I know are more comfortable with black and white settings than with ambiguity.)

Of course, this can only be part of the explanation. One of the pitfalls for those who seek an explanation for radicalization is that, for every individual who becomes radicalized, there are 99 others who experienced apparently identical life trajectories (sometimes even siblings) without becoming radicalized, often without seeming to feel even the faintest pull towards radical ideologies.Personality must, therefore, play a huge part, and this is often underappreciated.

Religious does not equal stupid

A range of people, from David Brooks to Peter Bergen, have responded to the rhetoric associated with the Countering Violent Extremism Summit held in Washington this week. They point out that the motivation for Daish (aka ISIL [nobody knows why the U.S. White House, alone in the world, insists on this acronym] or ISIS) cannot be understood in terms of the American middle class and its aspirations: jobs, relationships, family, economic prosperity. Islam did not come boiling out of the deserts of Arabia in the 7th century because of the lack of economic opportunity in the sphere of camel raising.

But behind these misunderstanding lies a deeper one. Many of the elites in government and industry in Western countries think that people who have religious beliefs are either: stupid for believing something so self-evidently wrong; or devious and cunning in pretending to have religious beliefs as a tool for exerting power (in the best traditions of post-modernism). Now of course they don’t necessarily think this explicitly, but the language being used in much of the discussion of radicalization and its causes makes it fairly obvious that they do think this implicitly. In other words, one or other of these two views informs the way they frame the problem of islamist radicalism to themselves.

Why do sane young men (and women) give up a lifestyle in the West that, while often not perfect, is much better than third-world conditions and the prospect of death in Syria? Holding either of these misconceptions distorts the view of the problem, and of the West’s opponents, to the point of delusion. If you think your opponents must somehow be intellectually stunted to believe what they do, you are never going to understand why other people find these beliefs attractive, and so will never be able to craft a strategy to defend against islamist propaganda that has any chance of working. If you think your opponents are hypocritical and opportunistic (not believing their own message) then you will equally never be able to craft a working defence. The temptation is to think (again implicitly) that radicalization must somehow be a kind of mental illness; perhaps we’ll begin to see “solutions” with that flavour rather than the current socio-economic flavour, coming into vogue soon.

I don’t have a solution. But the evidence so far (and I’ve done some empirical work in this area) is that socio-economic explanations for radicalization do not go very far; and that de-radicalization programs (or early-stage counter-radicalization strategies) that start with this assumption are even less useful. A more nuanced, and more realistic, view of our opponents and their motivations is desperately needed.

[Added later: The weekend news programs, which were filled with post mortems on the Countering Violent Extremism meeting, were great examples of the misconceptions I suggested in this post. Farid Zakaria actually made the claim that ISIS were faking their apparent beliefs to gain power. For a IMHO more realistic view, this article from the Atlantic: What ISIS Really Wants.]

Empirical Assessment of Al Qaeda, Isis, and Taliban Propaganda

I’ve just been working on assessing the potential impact of the three major magazines: Inspire (AQAP), Azan (Taliban), and Dabiq (ISIS), competing for the market in lone wolf jihadists in the West.

I compare these magazines using models for the intensity of informative, imaginative, deceptive, jihadist, and gamification language, and build an empirical model for propaganda which combines these into a single scale.

Unsurprisingly, Dabiq ranks highest in propaganda intensity.

The details can be found in the full draft paper, posted to SSRN:

Skillicorn, David, Empirical Assessment of Al Qaeda, Isis, and Taliban Propaganda (January 7, 2015). Available at SSRN: http://ssrn.com/abstract=2546478.

More subtle lessons from the Sony hack

There are some obvious lessons to learn from the Sony hack: perimeter defence isn’t much use when the perimeter has thousands of gates in it (it looks as if the starting point was a straightforward spearphishing attack); and if you don’t compartmentalise your system inside the perimeter, then anyone who gets past it has access to everything.

But the less obvious lesson has to do with the difference between our human perception of the difficulties of de-anonymization and aggregation, and the actual power of analytics to handle both. For example, presumably Sony kept data on their employees health in properly-protected HIPAA-compliant storage — but there were occasional emails that mentioned individuals and their health status. The people sending these emails presumably didn’t feel as if any particular one was a breach of privacy — the private content in each one was small. But they failed to realise that all of these emails get aggregated, at least in backups. So now all of those little bits of information are in one place, and the risks of building significant models from them has increased substantially.

Anyone with analytic experience and access to a large number of emails can find structures that are decidedly non-obvious; but this is far from intuitive to the public at large, and hence to Sony executives.

We need to learn to value data better, and to understand in a deep way that the value of data increases superlinearly with the amount that is collected into a single coherent unit.

