Posts Tagged 'ensemble'

Which predictors can rank?

To be able to build a ranking predictor, there must be some way of labelling the training records with (estimates of) their ranks, so that this property can be generalised to new records. This is often straightforward, even if the obvious target label doesn’t map directly to a ranking.

There are six mainstream prediction technologies:

  1. Decision trees. These are everyone’s favourites, but they are quite weak predictors, and can only be used to predict class labels. So no use for ranking.
  2. Neural networks. These are also well-liked, but undeservedly so. Neural networks can be effective predictors for problems where the boundaries between the classes are difficult and non-linear, but they are horrendously expensive to train. They should not be used without soul searching. They can, however, predict numerical values and so can do ranking.
  3. Support Vector Machines. These are two-class predictors that try to fit the optimal boundary (the maximal margin) between points corresponding to the records from each class. The distances from the boundary are an estimate of how confident the classifier is in the classification of each record, and so provide a kind of surrogate ranking: from large positive numbers down to 1 for one class and then from -1 to large negative numbers for the other class.
  4. Ensembles. Given any kind of simple predictor, a better predictor can be built by: creating samples of the records from the training dataset; building individual predictors from each sample; and the use the collection of predictors as single, global predictor by asking for the prediction of each one, and using voting to make the global prediction. Ensembles have a number of advantages, primarily that the individual predictors cancel out each others variance. But the number of predictors voting for the winning class can also be interpreted as a strength of opinion for that class; and so for a value on which to rank. In other words, a record can be unanimously voted normal, voted normal by all but one of the individual predictors, and so on.
  5. Random Forests. Random Forests are a particular form of ensemble predictor where each component decision tree is built making decisions about internal tests in a particularly robust and contextualized way. This makes them one of the most powerful prediction technologies known. The same technique, using the number of votes for the winning class, or the margin between the most popular and the next most popular class can be used as a ranking.
  6. Rules. Rules are used because they seem intuitive and explanatory, but they are very weak predictors. This is mostly because they capture local structure in data, rather than global structure captured by most other predictors. Rules cannot straightforwardly be used to rank.

So, although ranking predictors are very useful in adversarial situations, they are quite difficult to build and use.