Posts Tagged 'inductive modeling'

The Analysis Chasm

I’ve recently heard a couple of government people (in different countries) complain about the way in which intelligence analysis is conceptualized, and so how intelligence organizations are constructed. There are two big problems:

1.  “Intelligence analysts” don’t usually interact with datasets directly, but rather via “data analysts”, who aren’t considered “real” analysts. I’m told that, at least in Canada, you have to have a social science degree to be an intelligence analyst. Unsurprisingly (at least for now) people with this background don’t have much feel for big data and for what can be learned from it. Intelligence analysts tend to treat the aggregate of the datasets and the data analysts as a large black box, and use it as a form of Go Fish. In other words, intelligence analysts ask data analysts “Have we seen one of these?”; the data analysts search the datasets and the models built from them, and writes a report giving the answer. The data analyst doesn’t know why the question was asked and so cannot write a more helpful report that would be possible given some knowledge of the context. Neither side is getting as much benefit from the data as they could, and it’s mostly because of a separation of roles that developed historically, but makes little sense.

2. Intelligence analysts, and many data analysts, don’t understand inductive modelling from data. It’s not that they don’t have the technical knowledge (although they usually don’t) but they don’t have the conceptual mindset to understand that data can push models to analysts: “Here’s something that’s anomalous and may be important”; “Here’s something that only occurs a few times in a dataset where all behavior should be typical and so highly repetitive”; “Here’s something that has changed since yesterday in a way that nothing else has”. Data systems that do inductive modelling don’t have to wait for an analyst to think “Maybe this is happening”. The role of an analyst changes from being the person who has to think up hypotheses, to the person who has to judge hypotheses for plausibility. The first task is something humans aren’t especially good at, and it’s something that requires imagination, which tends to disappear in a crisis or under pressure. The second task is easier, although not something we’re necessarily perfect at.

There simply is no path for inductive models from data to get to intelligence analysts in most organizations today. It’s difficult enough to get data analysts to appreciate the possibilities; getting models across the chasm, unsolicited, to intelligence analysts is (to coin a phrase) a bridge too far.

Addressing both of these problems requires a fairly revolutionary redesign of the way intelligence analysis is done, and an equally large change in the kind of education that analysts receive. And it really is a different kind of education, not just a kind of training, because inductive modelling from data seems to require a mindset change, not the supply of some missing mental information. Until such changes are made, most intelligence organizations are fighting with one and a half arms tied behind their collective backs.

Estimating the significance of a factoid

You only have to mention Palantir to attract lots of traffic — oops, I did it again 🙂

Those of you who’ve been following along know that I’m interested in tools that help an analyst decide how to treat a new piece of information that arrives in an analysis system from the outside world. Many analysis tools provide exactly nothing to support analysts with this task  — new data arrives, and is stored away in the system, but analysts can only discover this by querying the system with a query that includes the new data as part of the result.

The next level of tool allows persistent queries; so an analyst can ask about some topic, and the system remembers the query. If new data appears that would have matched the query, the system notifies the analyst (think Google Alerts). This is a big step up in performance from an analyst point of view. Jeff Jonas has argued that, in fact, queries should be thought of as a symmetric form of data that can be accessed explicitly as well. For example, it may be signficant for an analyst that another analyst has made the same or a similar query.

However, this still requires analysts to manage their set of interests quite explicitly and over potentially long periods of time. We humans are not very good at managing the state of multiple mental projects, a fact that has made David Allen a lot of money. In the simplest case, if a system tells an analyst about a new result for a query that was originally made months ago, it may take a long time to recreate the situation that led to the original query being made, and so a long time to estimate the significance of the new information.

I don’t have a silver bullet to solve this class of problems. But I do think that it’s essential that tools become more proactive so that some of the judgement of how significant a newly arrived fact is can be made automatically and computationally. Of course, this is deeply contextual and very difficult.

It does seem helpful, though, to consider the spectrum of significance that might be associated with a new factoid. Let me suggest the following spectrum:

  • Normal.  Such a factoid is already fully accounted for by the existing mental model or situational awareness of the analysis. Its significance is presumptively low. Often this can be estimated fairly well using the equivalences common = commonplace = normal. In other words, if it resembles a large number of previous normal factoids, then it’s a normal factoid.
  • Anomalous. Such a factoid lies outside the normal but is ‘so close’ that it is best accounted for as a small deviation from normal. It’s the kind of factoid for which a plausible explanation is easy to come up with in a very short time frame.
  • Interesting. Such a factoid calls into question the accuracy or completeness of the existing model or situational awareness — something has been missed, or the structure of the model is not what it appeared to be.
  • Novel. Such a factoid does not resemble any that were used to build the model or situational awareness in the first place so its significance cannot be assessed in the current framework. The model must be incomplete in a substantial way.
  • Random. Stuff happens and some factoids will be so unusual that they have nothing to say about the existing model.

This is a spectrum, so there are no natural boundaries between these categories — and yet the actions that follow do depend on which of these five categories a factoid is placed in.

What makes estimating the significance of a new factoid difficult is that significance is greatest for the middle categories, and lowest for the extremal ones. Both normal and random are not signficant, while interesting and novel are the most significant. Many of the natural technologies tend to take a more monotonic view, for example intrusion detection systems. But we know several techniques for measuring significance that have the right qualitative properties, and these make it plausible that we can build systems that can present analysts with new factoids along with an indication of their presumptive significance.