Posts Tagged 'counterterrorism'

Call for Papers: Link Analysis, Counterterrorism and Security

The Call for the LACTS 2009 workshop is now available here.

The workshop takes place at the SIAM Data Mining Conference and brings together academics, practitioners, law enforcement, and intelligence people to talk about leading-edge work in the area of adversarial data analysis.

The workshop is intended primarily for early-stage work. The proceedings are published electronically, but authors may retain copyright.

The deadline for submissions is probably late December, but perhaps a little later (still being decided).

Knowledge Discovery for Counterterrorism and Law Enforcement

My new book, Knowledge Discovery for Counterterrorism and Law Enforcement, is out. You can buy a copy from:

The publisher’s website

Amazon.

(Despite what these pages say, the book is available or will be within a day or two.)

As the holiday season approaches, perhaps you have a relative who’s in law enforcement, or intelligence, or security? What could be better than a book! Or maybe you’d like to buy one for yourself.

(A portion of the price of this book goes to support deserving university faculty.)

“Whatever it is, I’m against it”

I’ve been reading Bobbitt’s new book “Terror and Consent” which has a lot to say about the adversarial setting, obviously with an emphasis on its role at a state level. I thoroughly recommend this book.

One of the points he makes suggests a new line of attack. He argues that, over the past five centuries, terrorism has taken the form of the state it opposes. Today, that means that this century’s terrorist groups, for which al Qaeda is a prototype, will tend to be globalized, multinational, and inclined to privatise and outsource.

But this means that, over a long period of time, there have always been those who oppose the existing state in an active, terror-based way, regardless of what that state was like, how honorable or moral it was, or what opportunites there were to change the situation from within. It is this group of people I mean to suggest in the title (which is a quotation from Groucho Marx).

Work has been done on understanding radicalization, but from the perspective of “understanding” — grievances, social issues or whatever. But Bobbitt’s framework suggests that there’s a more general form of, for want of a better word, radicalization whose drivers we don’t understand but are seemingly independent of the social context.

Of course, it’s not obvious that it would have been the same people acting as terrorists in all of these periods. But the fact that we don’t know shows that there’s something to be learned. If some people join terrorist groups for reasons that are deeply unconnected to the reasons why such groups exist, there is a whole new class of opportunities to detach or subvert them.

Using private documents to improve search in public documents

I’m back from the SIAM International Conference on Data Mining, and the 5th Workshop on Link Analysis, Counterterrorism, and Security, which I helped to organize. The workshop papers are now online, along with some open problems that were discussed at the end of the workshop.

I’ll post about some ideas that were tossed around at the workshop and conference in the next few days.

Let me start by talking about the work of Roger Bradford. Information retrieval starts from a document-term matrix, which is typically extremely large and sparse, and then reduces the dimensionality by using an SVD, a process sometimes called latent semantic indexing. This creates a representation space for both documents and terms. A query is treated as if it were a kind of short document and mapped into this representation space. Its near neighbours are then the documents retrieved in response to the query; and they can be sorted in decreasing distance from the query point as well.

Bradford showed that the original space can be built using a set of private documents and a set of public documents, and that the resulting representation space allows better retrieval performance than the space derived from the public documents, without allowing the properties of the private documents to be inferred.

In fact, the set of private documents can be diluted by mixing them with other documents before the process starts, making it even more difficult to work backwards to the private documents.

This process has a number of applications that he talks about in the paper. One of the most interesting is that it allows different organizations, for example allies, to share sensitive information without compromising it to each other — and still get the benefits of the relationships in the full set of documents.

Workshop and Link Analysis, Counterterrorism, and Security

If you’re interested in the content of this blog, and you live in the Atlanta area, you might be interested in coming to LACTS, the Workshop on Link Analysis, Counterterrorism, and Security. It’s being held on April 26th (Saturday) as part of the SIAM International Data Mining Conference. A one-day registration deal is available.

The proceedings will also be available online, both via my website and from SIAM after the workshop.

Here is the schedule:

0825-0830: Introduction
Antonio Badia and David Skillicorn

0830-0900: Detecting Hidden Passages in Documents
Saket S.R. Mengle and Nazli Goharian

0900-0930: Exploiting Sensitive Information in Background Mode using Latent Semantic Indexing
R. B. Bradford

0930-1000: Topic Detection Using Independent Component Analysis
Scott Grant, David Skillicorn, and James R. Cordy

1000-1030: Coffee Break

1030-1100: Using AI for Sensemaking in Investigative Analysis
Summer Adams, Ashok K. Goel, and Neha Sugandh

1100-1130: Vulnerability Assessment on Adversarial Organization: Unifying Command and Control Structure Analysis and Social Network Analysis
Il-Chul Moon, Kathleen M. Carley, and Alexander H. Levis

1130-1200: Torus Graph Inference for Detection of Localized Activity
Elizabeth A. Beer, Carey E. Priebe, and Edward R. Scheinerman

1200-1330: Lunch (on your own)

1330-1430: Workshop Keynote: “The Road to Link Intelligence”
Sherry Marcus, 21st Century Technologies.

1430-1500: Enhancing the Automated Analysis of Criminal Careers
Tim K. Cocx, Walter A. Kosters, and Jeroen F.J. Laros

1500-1530: Summarization and Information Loss in Network Analysis
Jamie F. Olson and Kathleen M. Carley

1530-1545: Summing Up
Antonio Badia and David Skillicorn

Doing prediction in adversarial settings

The overall goal of prediction in adversarial settings is to stop bad things happening — terrorist attacks, fraud, crime, money laundering, and lots of other things.

People intuitively think that the way to address this goal is to try and build a predictor for the bad thing. A few moments thought shows that building such a predictor is a very difficult, maybe impossible, thing to do. So some people immediately conclude that it’s silly or a waste of money, to try and address such goals using knowledge discovery.

There are a couple of obvious reasons why direct prediction won’t work. The first is that bad guys have a very large number of ways in which they can achieve their goal, and it’s impossible for the good guys to consider every single one in designing the predictive model.

This problem is very obvious in intrusion detection, trying to protect computer systems against attacks. There are two broad approaches. The first is to keep a list of bad things, and block any of them when they occur. This is how antivirus software works — every day (it was every week; soon it will be every hour) new additions to the list of bad things have to be downloaded. Of course, this doesn’t predict so-called zero-day attacks, which use some mechanism that has never been used before and so is not on the list of bad things. The second approach is to keep track of what has happened before, and prevent anything new from happening (without some explicit approval by the user). The trouble is that, although there are some regularities in what a user or a system does every day, there are always new things — new websites visited, email sent to new addresses. As a result, alarms are triggered so often that it drives everyone mad, and such systems often get turned off. Vista’s user authorization is a bit like this.

The other difficulty with using direct prediction is making it accurate enough. Suppose that there are two categories that we want to predict: good, and bad. A false positive is when a good record is predicted to be bad; and a false negative is when a bad record is predicted to be good. Both kinds of wrong predictions are a problem, but in different ways. A false positive causes annoyance and irritation, and generates extra work, since the record (and the person it belongs to) must be processed further. However, a false negative is usually much worse — because it means that a bad guy gets past the prediction mechanism.

Prediction technology is considered to be doing well if it achieves a prediction accuracy of around 90% (the percentage of records predicted correctly). It would be fabulous if it achieved an accuracy of 99%. But when the number of records is 1 million, a misclassification rate of 1% is 10,000 records! The consequences of this many mistakes would range from catastrophic to unusable.

These problems with prediction have been pointed out in the media and in some academic writing, as if they meant that prediction in adversarial settings is useless. This is a bit of an argument against a straw man. What is needed is a more thoughtful way of thinking about how prediction should be done, which I’ll talk about in the next posting.

Looking for Bad Guys III: Using manipulation

Bad guys who are aware the knowledge-discovery tools will be used to look for them may also try to actively manipulate the process to their own advantage.

One way to do this is to get an insider working for them, someone who can alter the data or the results of the analysis to this benefit. This is probably the most common method: over all of history, probably more sieges have been successful because someone opened the gates from the inside than because the walls were broken through. It’s easy to get caught up in the cleverness of technology and forget that sometimes suborning someone is the easiest attack.

However, the focus of this blog is knowledge discovery, so let me concentrate on that. Before we talk about how manipulation can be exploited as a discovery tool, we need to talk about what manipulation looks like; and before we can do that, we need to think about the structure of the knowledge-discovery process.

It’s helpful to divide up the stages of knowledge discovery into:

  1. Collecting the data (CCTV images, transaction logs);
  2. Analysing the data (the part that’s usually thought of as the heart of knowledge discovery);
  3. Deciding on what to do with the results and taking action;

Although an adversary can only attack the process via the data that is collected (assuming they don’t have an insider), it is helpful to think of three different kinds of attacks, directed against each of the three stages. The different attacks require understanding different aspects of the knowledge-discovery system.

Manipulating the data collection stage is probably the easiest, because it’s often possible to see and understand how the data is being collected. For example, the fields of view of CCTV cameras can usually be inferred from their positions (even if they are enclosed in black plastic bubbles) and so ways to move around them without coming into view can be worked out. Alternatively, disguises can be used to conceal who is being seen, even though an image is captured. One of the reasons identity theft is a big business is that it provides a way to have data captured about you, but data that is useless because it doesn’t connect to the real you.

Manipulating the decision and action stage is done using social engineering. This means trying to create the impression in the minds of the people who are making the decisions and taking the actions that the analysis system has made an error.

Manipulating the analysis stage is surprisingly easier than it should be. This is because most knowledge-discovery technology has been tuned to give good results in data with natural variation. This gives an opportunity to insert data that is the worst possible from the point of view of the algorithms, and so enable bad guys to hide their traces.

The technology used for knowledge discovery needs to be completely rethought to take manipulation into account. This is primarily why adversarial knowledge discovery is not just another application of knowledge discovery, but a completely different problem.

The good part about this is that attempts at manipulation also create an abnormal signature in the data; and the process can be tuned to look for this signature as well.

What this blog is about

All of us leave traces in the data that we create, either intentionally or as a side-effect of the things we do in the world — walking in front of a CCTV camera, turning on a cell phone, or whatever.

Lots of this data is analyzed, for example by businesses that want to build a relationship to customers.

I’m interested in the special case where some of the people about whom data is collected want to hide their existence, what they are like, and what they are doing, usually because they are up to no good.

In such situations, the way in which the data is collected, and then analyzed, and then the decisions that are taken as a result have to be rethought to take account of the adversarial nature of the situation.

I’m interested in how to do knowledge discovery in these adversarial situations, and this blog will talk about the issues, the techologies, and some of the known results.

Adversarial situations include:

  • crime;
  • fraud (medical, insurance);
  • money laundering;
  • organizational malfeasance;
  • industrial espionage;
  • national defence; and
  • counterterrorism.

What bad guys do in these situations has huge costs. The cost of terorrism is obvious, but it’s less well-known that fraud costs an estimated 12% of GDP in developed economies.

Of course, the process of collecting and analyzing data is not necessarily benign, and many people have privacy concerns. We’ll talk about them too.