### Second-stage predictors

If we have a ranking predictor, we can choose to put the boundary in a position where the false negative rate is zero (which is what we desperately need to do in an adversarial situation).

The side-effect of this, of course, is that the false positive rate is likely to be very large, so at first glance this seems crazy. But only if we think that the problem can be solved by a single predictor.

Suppose, instead, that we think of the problem as one to be solved in stages.

The first predictor is intended mostly to reduce the problem to a manageable size. The boundary is set so that the false negative rate is zero, knowing that this means that many, many records will be ranked as “possibly abnormal”.

However, the total number of records is now 20% of what it was initially. We can now apply a more sophisticated predictor, still trying to model and predict normality, but able to spend more resources per record, and use a cleverer algorithm.

Again the result is a ranking of the records, into which we can insert a boundary such that the false negative rate is zero. This is still extremely conservative, so that many records that are quite normal may still be ranked on the “possibly abnormal” side of the boundary, but the number of records remaining is still smaller. Even if only half the records can be safely discarded as normal, we have still reduced the total number of records to 10% of the original.

This staged approach can be continued for as many stages as required. Each new stage works with fewer records, and can be correspondingly more sophisticated. Eventually, the number of questionable records may become small enough that a human can do the final check.

Assembling predictors in sequence means that predictors whose individual accuracies are not perfect can be arranged so that the overall effect is close to perfect.

Of course, we have not yet quite solved the problem of detecting possible bad things. The pool of abnormal records contains both records associated with bad guys and records associated with other more ordinary kinds of abnormality (eccentricity).

At this point, two lines of attack remain. The first is to build a model of eccentricity, and use it to eliminate those records from the data. The other is to observe that most bad-guy actions require some kind of collective effort, so that bad-guy records are likely to form some kind of grouping, while those of eccentrics are likely to be one-offs. (Of course, this does not solve the problem of the lone-wolf bad guy, which is why serial killers can be so hard to detect.)