Estimating the significance of a factoid

You only have to mention Palantir to attract lots of traffic — oops, I did it again 🙂

Those of you who’ve been following along know that I’m interested in tools that help an analyst decide how to treat a new piece of information that arrives in an analysis system from the outside world. Many analysis tools provide exactly nothing to support analysts with this task  — new data arrives, and is stored away in the system, but analysts can only discover this by querying the system with a query that includes the new data as part of the result.

The next level of tool allows persistent queries; so an analyst can ask about some topic, and the system remembers the query. If new data appears that would have matched the query, the system notifies the analyst (think Google Alerts). This is a big step up in performance from an analyst point of view. Jeff Jonas has argued that, in fact, queries should be thought of as a symmetric form of data that can be accessed explicitly as well. For example, it may be signficant for an analyst that another analyst has made the same or a similar query.

However, this still requires analysts to manage their set of interests quite explicitly and over potentially long periods of time. We humans are not very good at managing the state of multiple mental projects, a fact that has made David Allen a lot of money. In the simplest case, if a system tells an analyst about a new result for a query that was originally made months ago, it may take a long time to recreate the situation that led to the original query being made, and so a long time to estimate the significance of the new information.

I don’t have a silver bullet to solve this class of problems. But I do think that it’s essential that tools become more proactive so that some of the judgement of how significant a newly arrived fact is can be made automatically and computationally. Of course, this is deeply contextual and very difficult.

It does seem helpful, though, to consider the spectrum of significance that might be associated with a new factoid. Let me suggest the following spectrum:

  • Normal.  Such a factoid is already fully accounted for by the existing mental model or situational awareness of the analysis. Its significance is presumptively low. Often this can be estimated fairly well using the equivalences common = commonplace = normal. In other words, if it resembles a large number of previous normal factoids, then it’s a normal factoid.
  • Anomalous. Such a factoid lies outside the normal but is ‘so close’ that it is best accounted for as a small deviation from normal. It’s the kind of factoid for which a plausible explanation is easy to come up with in a very short time frame.
  • Interesting. Such a factoid calls into question the accuracy or completeness of the existing model or situational awareness — something has been missed, or the structure of the model is not what it appeared to be.
  • Novel. Such a factoid does not resemble any that were used to build the model or situational awareness in the first place so its significance cannot be assessed in the current framework. The model must be incomplete in a substantial way.
  • Random. Stuff happens and some factoids will be so unusual that they have nothing to say about the existing model.

This is a spectrum, so there are no natural boundaries between these categories — and yet the actions that follow do depend on which of these five categories a factoid is placed in.

What makes estimating the significance of a new factoid difficult is that significance is greatest for the middle categories, and lowest for the extremal ones. Both normal and random are not signficant, while interesting and novel are the most significant. Many of the natural technologies tend to take a more monotonic view, for example intrusion detection systems. But we know several techniques for measuring significance that have the right qualitative properties, and these make it plausible that we can build systems that can present analysts with new factoids along with an indication of their presumptive significance.

Advertisements

0 Responses to “Estimating the significance of a factoid”



  1. Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s





%d bloggers like this: