Review of “Supercrunching” by Ian Ayres

This book is about the rise of data mining/knowledge discovery with some overview of who’s doing it, what they’re doing, and the size of the data required. Ayres makes these points:

  • Predictors typically outperform skilled humans at prediction tasks;
  • The combination of the internet as a test platform and the large sizes of data available make doing randomized trials (i.e. natural experiments) cheap and robust;
  • Humans don’t like being outperformed by predictors, and tend not to believe the comparative results, no matter how strong; doctors are a special class of humans with these attitudes; teachers even more so.

These are all good points, they are well made, and there are many interesting and effective examples to back them up.

The book also spends some time, inexplicably, on the usefulness of standard deviations as modifiers of means.

The central points about the rise of both capability and quality of data mining are well made and relevant. However, the author’s own blind spots seriously weaken the case. First, when Ayres talks about predictors he means regression equations; he doesn’t seem to know much about any other kind of predictor except for neural networks, which get a brief mention as if they are some new and wonderful invention (rather than standard techniques for which the definitive book — Bishop’s Neural Networks for Pattern Recognition — appeared in 1995). What is, in a way, surprising, is how much production use of regression there still is. Imagine how much better they could do if they upgraded to more powerful algorithms.

Second, he doesn’t understand that data mining algorithms are themseves inductive. So he spends a chapter talking about the relationship between data mining and human intuition, arguing that the role of humans is to come up with the ideas for the models. In general, he (and many others) struggles to come up with a convincing explanation of the human role in data mining, and ends up saying some rather strange things.

The book is full of examples — this is its big strength — which, although technically simple, are practically interesting because they usually use large datasets and they are deployed, or at least have impacted, the real world.

Advertisements

0 Responses to “Review of “Supercrunching” by Ian Ayres”



  1. Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s





%d bloggers like this: