Posts Tagged 'las vegas'

More on Identity

I’ve mentioned the problem of figuring out when data records describe the same person in the two most recent posts. Casinos are required to ban certain people who have self-identified themselves as having a gambling problem, so they have to look carefully at everyone who books a room. They also, of course, have an interest in noticing when certain other people show up, for example card counters.

As I said yesterday, identity is a slippery thing to manage algorithmically. It’s only in the last century that governments have gotten into the act of certifying identity, via various forms of government-issued identification, going back to birth certificates.

Such documents are not necessarily very reliable. There’s a long history of forging them. But mostly identity gets fudged because people don’t use them directly — they copy names and addresses with characteristic human errors; and this process can be helped along by those who want to hide their identity. It’s socially acceptable to use variant names, and people constantly make mistakes with numbers. Those who want to can use these deniable mistakes to create multiple versions of their identities.

This is partly why there’s such an interest in biometrics. A biometric is an identity key that was given to you by God. The important distinction in biometrics is between a digital biometric and a non-digital one. A photo in a passport is a non-digital biometric — it can be used to associate the passport, and so its contents, with you, but doesn’t do much else. A digital biometric, such as a digitized photo, can act as a key to a large database of information about you.

Most biometrics are extremely easy to fool. You can read about some of the easy tricks here. Fingerprint scanners can be fooled by plastic wrap; iris scanners by printed photos of an iris.

In relationship/graph data, the problem with multiple records describing the same person is that they blur the structure of the connections around that person — making some paths seem longer, and some properties more diffuse. That’s why it’s important to be able to resolve identities when possible; but also why it’s important to stay agnostic over the long haul.

More on Las Vegas

Las Vegas is an interesting example for those who think about adversarial knowledge discovery because it shows how little people really value privacy. The casinos leave their customers no privacy — their every action is captured while they’re in the casino, and even making a hotel reservation starts a chain of analysis in motion. Imagine the fuss if a government did anything remotely like this!

I mentioned Jeff Jonas yesterday. He has made two contributions to adversarial knowledge discovery:

  1. An agnostic approach to people’s identity in data. One of the problems of data analysis, particularly when the data comes from multiple sources, is putting together the attributes that belong to the same person. Usually some kind of key is used, perhaps a biometric, so that records that belong together can be discovered to belong together. If you are trying to hide some of your data, you want to confuse the key as much as possible.There are many ways to do this, but the best ones are ones that can be disavowed if the question ever comes up. So people who are trying to hide use variants of their names, mix up digits of phone numbers and street addresses, and anything else that can also happen accidentally. Some studies of criminal records have shown that nearly half have been altered in this kind of way.Jonas’s Non-Obvious Relationship system allows records that might belong to the same person to coexist. When some analysis is done of the connections between people, which is where this kind of blurring matters because it increases the distances between the nodes representing people, the software can make an on-the-fly determination of how confident it is that a set of records represent the same person, without making the determination irreversible.
  2. He argues that any data analysis system should treat the queries that are made to it as new parts of the data, that should be added to the system and kept in it. This has two advantages. The first is that, if the answer to a query appears in the data after the query has been made, the answer can still be provided to the person who asked. The second is that a second query of the same kind can produce a response about the existence of the first query. In other words, one person can discover that someone else was asking the same question. Both of these are important and useful in adversarial settings, and should be considered for other data-analysis systems.

Jonas’s blog is

Lessons from “21”, aka “Bringing Down the House”

The film “21” is (loosely) based on the book by Mezrich, “Bringing down the House”. It describes the actions of a group of cardcounters who managed to make a lot of money from North American casinos.

Las Vegas is an interesting environment to think about adversarial knowledge discovery because it’s so well-developed, because there’s money to spend on ideas that work, and because it’s fairly easy to measure how well tools are working. For example, the expected return from every game is tracked in real-time, and deviations attract attention within a very small time window.

Jeff Jonas, now at IBM, has a lot of experience in this world — his Non-Obvious Relationship Awareness tool was/is heavily used by casinos to find an individual using more than one identity. If you ever have the chance to hear him speak, it’s always fun. Just one quote: “What happens in Vegas, stays in Vegas — on video”.

The main lesson from the book (and maybe the film) is how important social engineering can be in knowledge discovery. It’s no good using sophisticated data collection, and heavy-lifting data analysis, if the results are discounted because they don’t fit the preconceptions of the people who have to make decisions and take actions as the result of the knowledge.

In the casino context, the card counters played roles that were crafted to look like the people casinos want to see — people who, even when they’re winning big right now, will come back and lose even more tomorrow.

Building resistance to social engineering is difficult because it doesn’t lend itself to technological solutions. The “failure of imagination” with which the 9/11 Commission charges U.S. intelligence agencies is largely a social engineering issue. There were scattered pieces of information around, but they were discounted and/or ignored because nobody really believed in the actual possibility of an attack of that magnitude.

We don’t know a lot about how to increase imagination.