Posts Tagged 'prediction'

“But I don’t have anything to hide”

This is the common response of many ordinary people when the discussion of (especially) government surveillance programs comes up. And they’re right, up to a point. In a perfect world, innocent people have nothing to fear from government.

The bigger problem, in fact, comes from the data collected and the models built by multinational businesses. Everyone has something to hide from them: the bottom line prices we are willing to pay.

We have not yet quite reached the world of differential pricing. We’ve become accustomed to the idea that the person sitting next to us on a plane may have paid (much) less for the identical travel experience, but we haven’t quite become reconciled to the idea that an online retailer might be charging us more for the same product than they charge other people, let alone that the chocolate bar at the corner store might be more expensive for us. If anything, we’re inclined to think that an organisation that has lots of data about us and has built a detailed model of us might give us a better price.

But it doesn’t require too much prescience to see that this isn’t always going to be the case. The seller’s slogan has always been “all the market can bear”.

Any commercial organization, under the name of customer relationship management, is building a model of your predicted net future value. Their actions towards you are driven by how large this is. Any benefits and discounts you get now are based on the expectation that, over the long haul, they will reap the converse benefits and more. It’s inherently an adversarial relationship.

Now think about the impact of data collection and modelling, especially with the realization that everything collected is there for ever. There’s no possibility of an economic fresh start, no bankruptcy of models that will wipe the slate clean and let you start again.

Negotiation relies on the property that each party holds back their actual bottom line. In a world where your bottom line is probably better known to the entity you’re negotiating with than it is to you, can you ever win? Or even win-win? Now tell me that you have nothing to hide.

[And, in the ongoing discussion of post-Snowden government surveillance, there’s still this enormous blind spot about the fact that multinational businesses collect electronic communication, content and metadata; location; every action on portable devices and some laptops; complete browsing and search histories; and audio around any of these devices. And they’re processing it all extremely hard.]

Time to build an artificial intelligence (well, try anyway)

NASA is a divided organisation, with one part running uncrewed missions, and the other part running crewed missions (and, apparently, internal competition between these parts is fierce). There’s no question that the uncrewed-mission part is advancing both science and solar system exploration, and so there are strong and direct arguments for continuing in this direction. The crewed-mission part is harder to make a case for; the ISS spends most of its time just staying up there, and has been reduced to running high-school science experiments. (Nevertheless, there is a strong visceral feeling that humans ought to continue to go into space, which is driving the various proposals for Mars.)

But there’s one almost invisible but extremely important payoff from crewed missions (no, it’s not Tang!): reliability. You only have to consider the differences in the level of response between the failed liftoff last week of a supply rocket to the ISS, and the accidents involving human deaths. When an uncrewed rocket fails, there’s an engineering investigation leading to a signficant improvement in the vehicle or processes. When there’s a crewed rocket failure, it triggers a phase shift in the whole of NASA, something that’s far more searching. The Grissom/White/Chaffee accident triggered a complete redesign of the way life support systems worked; the Challenger disaster caused the entire launch approval process to be rethought; and the Columbia disaster led to a new regime of inspections after reaching orbit. Crewed missions cause a totally different level of attention to safety which in turn leads to a new level of attention to reliability.

Some changes only get made when there’s a really big, compelling motivation.

Those of us who have been flying across oceans for a while cannot help but be aware that 4-engined aircraft have almost entirely been replaced by 2-engined aircraft; and that spare engines strapped underneath the wing for delivery to a remote location to replace an engine that failed are never seen either. This is just one area where increased reliability has paid off. Cars routinely last for hundreds of thousands of kilometres, which only as few high-end brands used to do forty years ago. There are, of course, many explanations, but the crewed space program is part of the story.

What does this have to do with analytics? I think the time is right to try to build an artificial intelligence. The field known as “artificial intelligence” or machine learning or data mining or knowledge discovery is notorious for overpromising and underdelivering since at least the 1950s so I need to be clear: I don’t think we actually know enough to build an artificial intelligence — but I think it has potential as an organising principle for a large amount of research that is otherwise unfocused.

One reason why I think this might be the right time is that deep learning has shown how to solve (at least potentially) some of the roadblock problems (although I don’t know that the neural network bias in much deep learning work is necessary). Consciousness, which was considered an important part of intelligence, has turned out to be a bit of a red herring. Of course, we can’t tell what the next hard part of the problem will turn out to be.

The starting problem is this: how can an algorithm identify when it is “stuck” and “go meta”? (Yes, I know that computability informs us that there isn’t a general solution but we, as humans, do this all the time — so how can we encode this mechanism algorithmically, at least as an engineering strategy?) Any level of success here would payoff immediately in analytics platforms that are really good at building models, and even assessing them, but hopeless at knowing when to revise them.

If you see something, say something — and we’ll ignore it

I arrived on a late evening flight at a Canadian airport that will remain nameless, and I was the second person into an otherwise deserted Customs Hall. On a chair was a cloth shoulder bag and a 10″ by 10″ by 4″ opaque plastic container. Being a good citizen, I went over to the distant Customs officers on duty and told them about it. They did absolutely nothing.

There are lessons here about predictive modelling in adversarial settings. The Customs officers were using, in their minds, a Bayesion predictor, which is the way that we, as humans, make many of our predictions. In this Bayesian predictor, the prior that the ownerless items contained explosives was very small, so the overall probability that they should act was also very small — and so they didn’t act.

Compare this to the predictive model used by firefighters. When a fire alarm goes off, they don’t consider a prior at all. That is, they don’t consider factors such as: a lot of new students just arrived in town, we just answered a hoax call to this location an hour ago, or anything else of the same kind. They respond regardless of whether they consider it a ‘real’ fire or not.

The challenge is how to train front-line defenders against acts of terror to use the firefighter predictive model rather than the Bayesian one. Clearly, there’s still some distance to go.

Making recommendations different enough

One of the major uses of data analytics in practice is to make recommendations, either explicitly or implicitly. This is one area where the interests of marketers and the interests of consumers largely run together. If I want to buy something (a product or access to an experience such as listening to a song or seeing a movie) I’d just as soon buy something I actually want — so a  seller who can suggest something that fits has a better chance of getting my business than one that presents a set of generic choices. Thus businesses build models of me to try and predict what I am likely to buy.

Some of these businesses are middlemen, and what they are trying to predict is what kind of ads to show me on behalf of other businesses. Although this is a major source of revenue for web-based businesses, I suspect it to be an edge-case phenomenon — that is, the only people who actually see ads on the web are people who are new to the web (and there are still lots of them every day) while those who’ve been around for a while develop blindness to everything other than content they actually want. You see businesses continually developing new ways to make their ads more obtrusive but, having trained us to ignore them, this often seems to backfire.

Other businesses use permission marketing, which they can do because they already have a relationship. This gives them an inside track — they know things about their customers that are hard to get from more open sources, such as what they have actually spent money on. When they can analyse the data available to them effectively, they are able to create the win-win situation where what they want to sell me is also what I want to buy.

But there’s a huge hole in the technology that represents an opportunity at least as large as that on which Google was founded: how new and different should the suggestion be?

For example, if you buy a book from Amazon by a popular author, your recommendation list is populated by all of the other books by that same author, the TV programs based on that author’s books, the DVDs of the TV shows, and on and on. This is largely a waste of the precious resource of quality attention. Most humans are capable of figuring out that, if they like one book by an author, they may well like others. (In fact, sites like Amazon are surprisingly bad at figuring out that, when an author writes multiple series featuring different characters and settings, an individual might like one series but not necessarily the others.)

So what would be better? The goal, surely, is to suggest products that are similar to the ones the individual has already liked or purchased, but also sufficiently different that that individual would not necessarily have noticed them. In other words, current systems present products in a sphere around the existing product, but what they should do is present products in an annulus around the existing product. Different, but not too different.

This is surprisingly difficult to do. Deciding what similarity means is already a difficult problem; deciding what “just enough dissimilarity” means has been, so far, a too difficult problem. But what an opportunity!

Election winning language patterns

One of the Freakonomics books makes the point that, in football aka soccer, a reasonable strategy when facing a free kick is to stay in the middle of the goal rather than diving to one side or the other — but goaltenders hardly ever follow this strategy because they look like such fools if it doesn’t pay off. Better to dive, and look as if you tried, even if it turns out that you dived the wrong way.

We’ve been doing some work on what kind of language to use to win elections in the U.S. and there are some similarities between the strategy that works, and the goal tending strategy.

We looked at the language patterns of all of the candidates in U.S. presidential elections over the past 20 years, and a very clear language pattern for success emerged. Over all campaigns, the candidate who best deployed this language won, and the margin of victory relates quite strongly to how well the language was used (for example, Bush and Gore used this pattern at virtually identical levels in 2000).

What is this secret language pattern that guarantees success? It isn’t very surprising: high levels of positive words, non-existent levels of negative words, abstract words in preference to concrete ones, and complete absence of reference to the opposing candidate(s).

What was surprising is how this particular pattern of success was learned and used. Although the pattern itself isn’t especially surprising, no candidate used it from the start; they all began with much more conventional patterns: negativity, content-filled policy statements, and comparisons between themselves and the other candidates. With success came a change in language, but the second part of the surprise is that the change happened, in every case, over a period of little more than a month. For some presidents, it happened around the time of their inaugurations; for others around the time of their second campaign, but it was never a gradual learning curve. This suggests that what happens is not a conscious or unconscious improved understanding of what language works, but rather a change of their view of themselves that allows them to become more statesmanlike (good interpretation) or entitled (bad interpretation). The reason that presidents are almost always re-elected is that they use this language pattern well in their second campaigns. (It’s not a matter of changing speechwriting teams or changes in the world and so in the topics being talked about — it’s almost independent of content.)

So there’s plenty of evidence that using language like this leads to electoral success but, just as for goaltenders, no candidate can bring himself or herself to use it, because they’d feel so silly if it didn’t work and they lost.

Predicting the Future

Arguably the greatest contribution of computing to the total of human knowledge is that relatively simple results from theoretical models of computation show that the future is an inherently unknowable place — not just in principle, but for fundamental reasons.
A Turing machine is a simple model of computation with two parts: an infinite row of memory elements, each able to contain a single character; and a state machine, a simple device that is positioned at one of the memory elements, and makes moves by inspecting the single character in the current element and using a simple transition table to decide what to do next. Possible next moves include changing the current character to another one (from a finite alphabet) or moving one element to the right or to the left, or stopping and turning off.
A Turing machine is a simple device; all of its parts are straightforward; and many real-world simulators have been built. But it is hypothesised that this simple device can compute any function that can be computed by any other computational device, and so it contains within its simple structure everything that can be found in the most complex supercomputer, except speed.
It has been suggested that the universe is a giant Turing machine, and everything we know so far about physics continues to work from this perspective, with the single exception that it requires that time is quantized rather than continuous — the universe ticks rather than runs.
But here’s the contribution of computation to epistemology: Almost nothing interesting about the future behaviour of a Turing machine is knowable in any shortcut way, that is in any way that is quicker than just letting the Turing machine run and seeing what happens. This includes questions like: will the Turing machine ever finish its computation? Will it revisit this particular memory element? Will this symbol ever again contain the same symbol that it does now? and many others. (These questions may be answerable in particular cases, but they can’t be answered in general — that is you can’t inspect the transitions and the storage and draw conclusions in a general way.)
If most of the future behaviour of such a simple device is not accessible to “outside” analysis, then almost every property of more complex systems must be equally inaccessible.
Note that this is not an argument built from limitations that might be thought of as “practical”. Predicting what will happen tomorrow is not, in the end, impossible because we can’t gather enough data about today, or because we don’t have the processing power to actually build the predictions — it’s a more fundamental limitation in the nature of what it means to predict the future. This limitation is akin (in fact, quite closely related) to the fact that, within any formal system, there are some theorems that we know to be true but cannot prove.

There are other arguments that also speak to the problem of predicting the future. These aren’t actually needed, given the argument above, but they are often advanced, and speak more to the practical difficulties.

The first is that non-linear systems are not easy to model, and often have unsuspected actions that are not easy to infer even when we have a detailed understanding of them. Famously, bamboo canes can suddenly appear from the ground and then grow more than 2 feet in a day.

The second is that many real-world systems are chaotic, that is infinitesimal differences in their conditions at one moment in time can turn into enormous differences at a future time. This is why forecasting the weather is difficult: a small error in measurement at one  weather station today (caused, perhaps by a butterfly flapping its wings) can completely change tomorrow’s weather a thousand miles away. The problem with predicting the future here is that the current state cannot be measured to sufficient accuracy.

So if the future is inherently, fundamentally impossible to predict, what do we mean when we talk about prediction in the context of knowledge discovery? The answer is that predictive models are not predicting a previously unknown future, but are predicting the recurrence of patterns that have existed in the past. It’s desperately important to keep this in mind.

Thus when a mortgage prediction system (should this new applicant be given a mortgage?) is built, it’s built from historical data: which of a pool of earlier applicants for mortagages did, and did not, repay those loans. The prediction for a new mortgage applicant is, roughly speaking, based on matching the new applicant to the pool of previous applicants and making a determination from what the outcomes were for those. In other words, the prediction assumes an approximate rerun of what happened before — “now” is essentially the same situation as “then”. It’s not really a prediction of the future; it’s a prediction of a rerun of the past.

All predictive models have (and must have) this historical replay character. Trouble starts when this gets forgotten, and models are used to predict scenarios that are genuinely in the future.  For example, in mortgage prediction, a sudden change in the wider economy may be significant enough that the history that is wired into the predictor no longer makes sense. Using the predictor to make new lending decisions becomes foolhardy.

Other situations have similar pitfalls, but they are a bit better hidden. For example, the dream of personalised medicine is to be able to predict the outcome for a patient who has been diagnosed with a particular disease and is being given a particular treatment. This might work, but it assumes that every new patient is close enough to some of the previous patients that there’s some hope of making a plausible prediction. At present, this is foundering on the uniqueness of each patient, especially as the available pool of existing patients for building the predictor is often quite limited. Without litigating the main issue, models that attempt to predict future global temperatures are vulnerable to the same pitfall: previous dependencies of temperatures on temperatures at earlier times do not provide a solid epistemological basis for predicting future temperatures based on temperatures now (and with the triple whammy of fundamental unpredictability, chaos, and non-linear systems).

All predictors should be built so that predictions all pass through a preliminary step that compares them to the totality of the data used to build the predictor. New records that do not resemble records used for training cannot legitimately be passed to the predictor, since the result has a strong probability of being fictional. In other words, the fact that a predictor was build from a particular set of training data must be preserved in the predictor’s use. Of course, there’s an issue of how similar a new record must be to the training records to be plausibly predicted. But at least this question should be asked.

So can we predict the future? No, we can only repredict the past.

You heard it here first

As I predicted on August 8th, Obama has won the U.S. presidential election. The prediction was made based on his higher levels of persona deception, that is the ability to present himself as better and more wonderful than he actually is. Romney developed this a lot during the campaign and the gap was closing, but it wasn’t enough.

On a side note, it’s been interesting to notice the emphasis in the media on factual deception, and the huge amount of fact checking that they love to do. As far as I can tell, factual deception has at best a tiny effect on political success, whether because it’s completely discounted or because the effect of persona is so much stronger. On the record, it seems to me to be a tough argument that Obama has been a successful president, and indeed I saw numerous interviews with voters who said as much — but then went on to say that they would still be voting for him. So I’m inclined to the latter explanation.