Finding significance automatically

In a world where lots of data is collected and available, the critical issue for intelligence, law enforcement, fraud, and cybersecurity analysts is attention.

So the critical issue for tools to support such analysts is focus: how can the most significant and interesting pieces of data/information/knowledge be made the easiest to pay attention to?

This isn’t an easy issue to address for many reasons, some of which I talked about a few posts ago in the context of connecting the dots. But the fundamental problems are: (1) significance or interestingness are highly context dependent, so where to focus depends, in a complex way, on what the analyst already knows and understands; and (2) every new piece of information has the potential to completely alter the entire significance landscape in one hit.

Many existing tools are trying, underneath, to address the issue of focus indirectly, by providing ways for analysts to control their own focus more directly. For example, there are many analysts platforms that allow available information to be sliced and diced in many different ways. These allow two useful things to be done: (1) dross (the guaranteed insignificant stuff) can be removed (or at least hidden from sight); and the rest of the data can be rearranged in many different ways in the hope that human pattern-recognition skills can be brought to bear to find significance.

But it seems like a good idea to try and address the significance issue more directly. This has motivated a couple of the research projects I’m involved with:

  • The ATHENS system tries to find information on the web that is probably new to the user, but which s/he is well-positioned to understand; in other words, the new information is just over the horizon from the user’s current landscape. It builds this new information starting from a query that allows the user to provide the current context;
  • Finding anomalies in large graphs. Lots of data is naturally represented relationally as a graph, with nodes representing some kind of entities, and edges representing some kind of (weighted) similarity between some subset of the nodes (e.g. social networks). Graphs are difficult to work with because they don’t really have a representation that humans can work with — even drawing nice pictures of them tends to (a) occlude chunks once the graph gets big enough, and (b) hide the anomalous structure in the corners because the nice representation is derived from the big structure (e.g. the simple bits of the automorphism group). We’ve developed some tools that find and highlight anomalous regions, anomalous in the sense that, if you were to stand at their nodes and look at the landscape of the rest of the graph, it would look unusual.
  • Finding anomalies in text caused by either a desire to obfuscate the content that’s being talked about, or caused by internal mental state that’s unusual — being deceptive, or highly tense, for example.

Some other people are working in similar directions. For example, there is some work aimed at using social processes to help discover significance. In a sense, sites like Slashdot work this way — each user provides some assessment of quality/importance of some stories, and in return gets information about the quality/importance of other stories. This is also, of course, how refereed publications are supposed to work. The challenge is to contextualize this idea: what makes an object high quality/important for you may not mean anything to me. In other words, most significance lies somewhere on the spectrum from universal agreement to completely taste-based, and it’s hard to tell where, let alone compute it in a practical way.


0 Responses to “Finding significance automatically”

  1. Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: