Super Identities

I heard a talk on the UK Super Identity Project last week which stimulated some musings on this important topic.

Once upon a time, almost everyone lived in villages, and identity was not an issue — everyone you knew also knew you and many of them had done so since you were born. So identity issues hardly arose, apart from an occasional baby substitution (but note Solomon in 1 Kings 3:16-28 for an early identity issue). As rich people began to travel, new forms of identity evidence such as passports and letters of introduction were developed.

About a hundred years ago and as the result of mass movement to cities, questions of identity become common. You can see from the detective stories of the time how easy it was to assume another identity, and how difficult it was to verify one, much as it is in cyberspace today. To deal with these issues, governments become involved as the primary definers of identity, getting in on the act with birth certificates (before that, e.g. baptismal records), and then providing a continuous record throughout life.

In parallel, there’s the development of biometric identifiers, mostly to deal with law enforcement, first the Bertillon system and then fingerprints (although as I’ve noted here before, one of the first of the detective stories to include fingerprints– The Red Thumb Mark — is about how easy they are to forge).

The Super Identity project is trying to fuse a set of weak identifiers into a single identity with some reliability. Identities are important for three main reasons (a) trust, for example so that I can assume that someone I’m interacting with online is the person I think it is; (b) monetizing, for example so that an advertiser can be sure that the customized ad is being sent to the right person; and (c) law enforcement and intelligence, for example, these identities are actually the same underlying person.

There are many identifying aspects, almost all of which are bound to a particular individual in a weak way. They come in four main categories:

  1. Physical identifiers such as an address, or a place of employment.
  2. Biometrics (really a subset of the physical) such as fingerprints, iris patterns, voice and so on. These at first glance seem to be rather strongly bound to individuals, but all is not as it appears and they can often be forged in practice, if not in theory. There is an important subset of biometrics that are often forgotten, those that arise from subconscious processes; these include language use, and certain kinds of tics and habits. They are, in many ways, more reliable than more physical biometrics because they tend to be hidden from us, and so are harder to control.
  3. Online identifiers such as email addresses, social network presence, web pages, which are directly connected to individuals. Equally important are the indirect online identifiers that appear as an (often invisible) side-effect of online activity such as location.
  4. Identifiers associated with accessing the online world, that is identifiers associated with bridging from the real world to the online world. These include (beloved by governments despite their weakness) IP addresses which led to a recent police raid, complete with stun grenades, on an innocent house.

The problem with trying to fuse these weak identifying aspects into a single superidentity which can be robustly associated with an individual is this: it’s relatively difficult to avoid creating these identifying aspects, but it’s relatively easy to create more identifying aspects that can be used either to actively mislead or passively confuse the creation of the superidentity.

For example, there’s been some success in matching userids from different settings (gmail, facebook, flickr) and attributing them to the same person. But surely this can only work as long as that person makes no effort to prevent it. If I want to make it hard to match up my different forms of web presence then I can choose userids that don’t associate in a natural way — but I can also create extra bogus accounts that make the matching process much harder just from a computational point of view.

So it may be possible to create a cloud of identifying aspects, but it seems much more difficult to find the real person within that cloud, especially if they’re trying to make themselves hard to find. The Super Identity project would no doubt respond that most people aren’t about making themselves harder to identify. I doubt this; I think we’re moving to a world where obfuscation is going to be the only way to gain some privacy — a world in which the only way to dissociate ourselves from something we don’t want made public is to make the connection sufficiently doubtful that it cannot reliably acted on. This might be called self-spamming.

For example, if a business decides to offer differential pricing to certain kinds of customers (which has already happened), then I want to be able to dissociate myself from the category that gets offered the higher price if I possibly can. If the business has too good a model of my identity, I may not be able to prevent them treating me the way they want to rather than the way I want them to. (This is, of course, why almost all data mining is, in the end, going to be adversarial.)

In the end, behavior is the best signal of identity because it’s hard for us to modify, partly because we don’t have conscious awareness of much of it, and partly because we don’t have conscious control even when we have awareness. No wonder behavior modelling is becoming a hot topic, particularly in the adversarial domain.

Advertisements

0 Responses to “Super Identities”



  1. Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s





%d bloggers like this: