Posts Tagged 'sensemaking'

Time to build an artificial intelligence (well, try anyway)

NASA is a divided organisation, with one part running uncrewed missions, and the other part running crewed missions (and, apparently, internal competition between these parts is fierce). There’s no question that the uncrewed-mission part is advancing both science and solar system exploration, and so there are strong and direct arguments for continuing in this direction. The crewed-mission part is harder to make a case for; the ISS spends most of its time just staying up there, and has been reduced to running high-school science experiments. (Nevertheless, there is a strong visceral feeling that humans ought to continue to go into space, which is driving the various proposals for Mars.)

But there’s one almost invisible but extremely important payoff from crewed missions (no, it’s not Tang!): reliability. You only have to consider the differences in the level of response between the failed liftoff last week of a supply rocket to the ISS, and the accidents involving human deaths. When an uncrewed rocket fails, there’s an engineering investigation leading to a signficant improvement in the vehicle or processes. When there’s a crewed rocket failure, it triggers a phase shift in the whole of NASA, something that’s far more searching. The Grissom/White/Chaffee accident triggered a complete redesign of the way life support systems worked; the Challenger disaster caused the entire launch approval process to be rethought; and the Columbia disaster led to a new regime of inspections after reaching orbit. Crewed missions cause a totally different level of attention to safety which in turn leads to a new level of attention to reliability.

Some changes only get made when there’s a really big, compelling motivation.

Those of us who have been flying across oceans for a while cannot help but be aware that 4-engined aircraft have almost entirely been replaced by 2-engined aircraft; and that spare engines strapped underneath the wing for delivery to a remote location to replace an engine that failed are never seen either. This is just one area where increased reliability has paid off. Cars routinely last for hundreds of thousands of kilometres, which only as few high-end brands used to do forty years ago. There are, of course, many explanations, but the crewed space program is part of the story.

What does this have to do with analytics? I think the time is right to try to build an artificial intelligence. The field known as “artificial intelligence” or machine learning or data mining or knowledge discovery is notorious for overpromising and underdelivering since at least the 1950s so I need to be clear: I don’t think we actually know enough to build an artificial intelligence — but I think it has potential as an organising principle for a large amount of research that is otherwise unfocused.

One reason why I think this might be the right time is that deep learning has shown how to solve (at least potentially) some of the roadblock problems (although I don’t know that the neural network bias in much deep learning work is necessary). Consciousness, which was considered an important part of intelligence, has turned out to be a bit of a red herring. Of course, we can’t tell what the next hard part of the problem will turn out to be.

The starting problem is this: how can an algorithm identify when it is “stuck” and “go meta”? (Yes, I know that computability informs us that there isn’t a general solution but we, as humans, do this all the time — so how can we encode this mechanism algorithmically, at least as an engineering strategy?) Any level of success here would payoff immediately in analytics platforms that are really good at building models, and even assessing them, but hopeless at knowing when to revise them.

Compelling evidence on Benghazi timeline

Kathleen Carley presented work on the social media data flow before, during, and after the Benghazi embassy attack in September 2012. She happened to be teaching a course on analysis of social media (tweets and mainstream media) over the early part of September and was able to quickly repurpose it.

Her results show that, in Libya, there was no social media discussion of the embassy attacks until several hours after they happened. Discussion of the infamous movie also only begins well after the attacks and then only as a result of speculation about whether it played any role.

In contrast, Egyptian social media feeds were abuzz with demonstration rhetoric well before the activity in Cairo.

This seems to provide a compelling argument against any “spontaneous demonstration” scenario to explain what happened in Benghazi (if anyone still thinks that). It’s also a nice demonstration of the coming of age of real-time social media analysis, although it also shows that getting real-time analysis requires having a team in place before hand.

The reference is: Near Real Time Assessment of Social Media Using Geo-Temporal Network Analytics, Kathleen M. Carley,  Juergen Pfeffera, Huan Liu, Fred Morstatter, Rebecca Goolsby, Proceedings of Advances in Social Network Analysis and Modelling (ASONAM) 2013, ACM & IEEE, 517-524.

Does the 80:20 rule actually help?

Pareto was the first (well, the first to get his name attached) to notice that many systems have the property that 80% of the payoff comes from 20% of the effort.

Systems where the 80:20 rule holds are often touted in business books, the implication being that one should get the 80% of the total available payoff by putting in 20% of the effort, and then stopping. Overall, of course, this produces a large benefit-cost ratio. However, this idea makes two huge assumptions:

  1. You know, in advance, which 20% of the effort leads to the 80% of the payoff; in other words, it’s possible to pick out the low hanging fruit in advance.
  2. The other 20% of the payoff doesn’t matter, and so can be ignored.

The first assumption is particularly unrealistic. It may well be that 80% of the commissions come from 20% of the sales calls, but this isn’t as useful as it sounds because you generally don’t know which 20%. Maybe it’s possible to detect the high hanging fruit: those calls that are long shots, with a lower probablility of success.

The second assumption assumes that you are looking at a pool of independent tasks, and you can leave any number of tasks undone.

But what of the more realistic scenario where the tasks are related, they all need to be done, and it’s difficult or impossible to decide on the reward for completing any particular one of them?

There are two extremal possibilities. The first is that all of the easy tasks come at the beginning, so that the payoff versus effort curve looks like this:

easyfirst
Here the payoff comes early with the first 20% of the effort producing 80% of the result. But all good things come to an end, and once this first 20% of effort has been expended, an awful lot of effort is left. And notice that the payoff per unit of effort increases 16 times (!!) from the easy to the hard section.

The second extremal possibility looks like this:

hardfirstThis, of course, is really daunting because the first 80% of the effort leads to only 20% of the payoff. These tasks may never get done because the first steps seem so unrewarding that they are given up on. So although tasks with the 80:20 property are often labelled as attractive, some of them at least may be among the most unattractive possible.

Of course, most real tasks lie somewhere between these extremes, with regions that are steep and regions that are flat. The real force of knowing that a task is 80:20 is that it informs you that the effort required for different parts of the task can vary by as much as a factor of 16. It may be useful to know this pattern in advance to avoid discouragement.

In contrast, a task with a 60:40 ratio varies in difficult by only a factor of 2¼. Since any task that has to be completely done has the property that 100% of the effort is required to obtain 100% of the payoff, the differences in these ratios are really about expected variations in the apparent difficulty at different moments in the process. Less useful than it’s often touted, but useful in a different way.

Terrorist incidents come in only a few flavors

Terrorist attacks are different in many ways: they take place in different countries, with different motivations behind them, using different mechanisms, and with varying degrees of success. But are there any commonalities that could be used, for example, to categorize them and so to defend against them in more focused ways? The answer is yes, there are large-scale similarities.

To do this analysis, I started from the Global Terrorism Database developed by START, the National Consortium for the Study of Terrorism and Responses to Terrorism. The database contains details of all incidents that meet their coding standards since the beginning of 1970, and I used the version released at the end of 2012. There was one major discontinuity where new fields were added but overall the coding has been consistent over the entire 40+ year period.

The image below shows the clustering of all attacks over that time period:

attackslabelledThe large structure looks like a hinge with clusters A and B at the top, clusters C and D forming the hinge itself, and clusters E, F, G, and H at the bottom. There’s also a distinction between the clusters at the front (B, D, F, and H) and those at the back (A,C,E, and G). (You’ll have to expand the figure to see the labels clearly.)

The first thing to notice is that there are only 8 clusters and, with the exception of H which is quite diffuse, they clusters are fairly well defined. In other words, there are 8 distinctive kinds of terrorist attack (and only 8, over a very long time period).

Let’s dig into these clusters and see what they represent. The distinction between the front and the back is almost entirely related to issues of attribution: whether the attack was claimed, how clear that claim is (for example, are there multiple claim of responsibility for the same incident), and whether the incident should be properly claimed as terrorism or something else (quasi-military, for example).

The structure of the hinge differentiates between incidents involving capturing people (hijackings or kidnappings in A and B) and incidents that are better characterized as attacks (C, D, E, F, G, H).  The extremal ends of A and B (to the right) are incidents that lasted longer and/or the ransom was larger.

The differences between C/D, E/F, and G/H arise from the number of targets (which seems to be highly correlated with the number of different nationalities involved). So C and D are attacks on a single target, E and F are attacks on two targets, and G and H are attacks on three targets. Part of the diffuse structure of H happens because claims are always murkier for more complex attacks and part because there is a small group of incidents involving 4 targets that appears, as you’d expect, even further down and to the right.

Here are some interesting figures which overlay the intensity of a property on the clustering, so that you can see how it’s associated with the clusters:

overlayclaimed

This figure shows whether the incident was claimed or not. The color coding runs from dark red to bright yellow; I’m not specifying the direction, because it’s complicated, but the contrast shows differences. In each case, the available color spectrum is mapped to the range of values.

overlaynhostkid

This figure shows the differences between incidents where there were some hostages or kidnapped and those where there weren’t.

overlaycountryThis figure shows that the country in which the incident took place is mostly unrelated to other properties of the incident; in other words, attacks are similar no matter where they take place.

This analysis shows that, despite human variability, those designing terrorist incidents choose from a fairly small repertoire of possibilities. That’s not to say that there couldn’t be attacks in which some people are also taken hostage; rather that those doing the planning don’t seem to conceptualize incidents that way, so when it happens it’s  more or less by accident. Perhaps some kind of Occam’s razor plays a role: planning an incident is already difficult so there isn’t a lot of brainpower to try for extra cleverness, and there’s probably also a perception that complexity increases risk.

Questions are data too

In the followup investigation of the Boston Marathon bombings, we see again the problem that data analytics has with questions.

Databases are built to store data. But, as Jeff Jones has most vocally pointed out, simply keeping the data is not enough in adversarial settings. You also need to keep the questions, and treat them as part of the ongoing data. The reason is obvious once you think about it — intelligence analysts need not only to know the known facts; they also need to know that someone else has asked the same question they just asked. Questions are part of the mental model of analysts, part of their situational awareness, but current systems don’t capture this part and preserve it so that others can build on it. In other words, we don’t just need to connect the dots; we need to connect the edges!

Another part of this is that, once questions are kept, they can be re-asked automatically. This is immensely powerful. At present, an analyst can pose a question (“has X ever communicated with Y?”), get a negative answer, only for information about such a communication to arrive a microsecond later and not be noticed. In fast changing environments, this can happen frequently, but it’s implausible to expect analysts to remember and re-pose their questions at intervals, just in case.

We still have some way to go with the tools and techniques available for intelligence analysis.

Understanding High-Dimensional Spaces

My new book with the title above has been published by Springer, just in time for Christmas gift giving for the data miner on your list.

The book explores how to represent high-dimensional data (which almost all data is), and how to understand the models, particularly for problems where the goal is to find the most interesting subset of the records. “Interesting”, of course, means different things in different settings; a big part of the focus is on finding outliers and anomalies.

Partly the book is a reaction to the often unwitting assumption that clouds of data can be understood as if they had a single centre — for example, much of the work on social networks.

The most important technical ideas are (a) that clusters themselves need to be understood as having a structure which provides each one with a higher-level context that is usually important to make sense of them, and (b) that the empty space between clusters also provides information that can help to understand the non-empty-space.

You can buy the book here.

The Analysis Chasm

I’ve recently heard a couple of government people (in different countries) complain about the way in which intelligence analysis is conceptualized, and so how intelligence organizations are constructed. There are two big problems:

1.  “Intelligence analysts” don’t usually interact with datasets directly, but rather via “data analysts”, who aren’t considered “real” analysts. I’m told that, at least in Canada, you have to have a social science degree to be an intelligence analyst. Unsurprisingly (at least for now) people with this background don’t have much feel for big data and for what can be learned from it. Intelligence analysts tend to treat the aggregate of the datasets and the data analysts as a large black box, and use it as a form of Go Fish. In other words, intelligence analysts ask data analysts “Have we seen one of these?”; the data analysts search the datasets and the models built from them, and writes a report giving the answer. The data analyst doesn’t know why the question was asked and so cannot write a more helpful report that would be possible given some knowledge of the context. Neither side is getting as much benefit from the data as they could, and it’s mostly because of a separation of roles that developed historically, but makes little sense.

2. Intelligence analysts, and many data analysts, don’t understand inductive modelling from data. It’s not that they don’t have the technical knowledge (although they usually don’t) but they don’t have the conceptual mindset to understand that data can push models to analysts: “Here’s something that’s anomalous and may be important”; “Here’s something that only occurs a few times in a dataset where all behavior should be typical and so highly repetitive”; “Here’s something that has changed since yesterday in a way that nothing else has”. Data systems that do inductive modelling don’t have to wait for an analyst to think “Maybe this is happening”. The role of an analyst changes from being the person who has to think up hypotheses, to the person who has to judge hypotheses for plausibility. The first task is something humans aren’t especially good at, and it’s something that requires imagination, which tends to disappear in a crisis or under pressure. The second task is easier, although not something we’re necessarily perfect at.

There simply is no path for inductive models from data to get to intelligence analysts in most organizations today. It’s difficult enough to get data analysts to appreciate the possibilities; getting models across the chasm, unsolicited, to intelligence analysts is (to coin a phrase) a bridge too far.

Addressing both of these problems requires a fairly revolutionary redesign of the way intelligence analysis is done, and an equally large change in the kind of education that analysts receive. And it really is a different kind of education, not just a kind of training, because inductive modelling from data seems to require a mindset change, not the supply of some missing mental information. Until such changes are made, most intelligence organizations are fighting with one and a half arms tied behind their collective backs.