Posts Tagged 'biometrics'

Biometrics are not the answer for authentication

I’ve pointed out before that biometrics are not a good path to follow to avoid the obvious and growing issues with authentication using passwords.

Many biometrics suffer from being easy to spoof: pictures of someone’s iris, appropriately embedded in a background, can fool iris readers, a sheet of clingfilm can often cause a fingerprint reader to ‘see’ the last real fingerprint used on it, and so on.

But there’s a more pervasive problem with biometrics. The fact that a biometric is something you are is, on the one hand, a positive because you don’t have to remember anything, and wherever you go, there you are.

But, on the other hand, a biometric cannot be changed, and this turns out to be a huge problem.

Suppose you go to authenticate using a biometric. The device that captures your biometric must convert it to something digital, and then compare that digital value to a previously recorded value associated with you.

There are two problems:

  1. For a while, the device has your biometric data as plaintext. It may be encrypted very close to the place where it is captured, but there is a gap, and the unencrypted version can potentially be grabbed in the gap. There is always a temptation/pressure to use low-power sensors for capture, and they may not be able to handle the encryption.
  2. The previously recorded values must be kept somewhere. If this location can be hacked, then the encrypted versions of the biometric can be copied. These encrypted versions can then be used for replay attacks.

Of course, there are defences. But, for example, if e-passports are to be used to enter multiple countries, then they must use the same repertoire of encryption techniques so that passports from multiple countries can be read by the same system. So it’s not enough to say that different encryptions of biometric plaintext to its encrypted versions will prevent these issues.

And if one person’s encrypted biometric is stolen, there’s no practical way to update the system’s that rely on it (since they must continue to use the same mapping so that everyone else’s biometrics will still work). More importantly, there’s no way to issue a fresh identity for the person whose data was stolen (“Go and have plastic surgery so that we can restore your use of facial recognition”).

Facial recognition

Most of the drama shows on television build on some kind of facial recognition, a set of faces flickering rapidly in the background, getting a match just as the main characters rendezvous in front of the screen.

I looked into the performance of facial recognition for my book “Knowledge Discovery for Counterterrorism and Law Enforcement (Taylor and Francis, available from all good booksellers) but it’s been a while so I thought I would go back and look at the current state-of-the-art.

First, it probably doesn’t have to be said, but real systems don’t display all of the faces as they process them — if they did it would slow them up by a factor of more than a thousand.

Second, what is their performance? There are many variables: camera angle, lighting, amount of space (and so detail) available for image storage.

There are also different versions of the problem. One important one is deciding if this specific stored image matches this just-captured image or not. This is what is used with biometric data stored in passports; there’s a digitized version of the photo you submitted in the chip in your passport; when you cross a border, a photo is taken of you (again, under quite controlled conditions) and that new photo is matched to the old. Even for such a 1-to-1 match the error rate is not trivial — I’ve seen 25% quoted which seems high, but agrees with my own experience.

The more common problem (in tv shows) is that an image has been captured from, say, CCTV and the goal is to determine if the person with that image is in a large database of identified images. In the jargon, the database images are called enrolled, and the newly collected one is called a probe.

Performance is usually characterized by giving a False Match Rate (FMR), the rate of matching a probe to an enrolled image when they aren’t actually the same person. So, for an access control system, this is the rate at which the system would let an intruder in. At present, values of around 0.001 (1 in a thousand) are typical. For this value, then, the dependent variable is the False Non Match Rate (FNMR) which is the rate at which someone who does match gets missed. So, for an access control system, this means that a legitimate entrant gets locked out.  These are typically in the range 0.03 to about twice that (3 in a hundred).

You can see that these results are much, much weaker than those portrayed on tv. If the database contains a million images, then it’s not a case of exclaiming “We found a match” but “we found a thousand matches (and now we have to go through them and see if we think any of them is actually a match)”. Not finding a match would be much more surprising. Some systems seem to be much worse; you don’t have to look far to find stories of facial recognition systems that have never matched anyone, even when their images are known to have been captured as probes — part of the problem being that one person in a hoodie looks pretty much like any other person in a hoodie.

From an access control point of view, these rates mean that there’s a 1 in a thousand chance of an intruder getting is (which is probably acceptable for many situations), but 3% or more of the time, legitimate users will have to try again. I haven’t seen any data, but presumably the false non matches are not uniformly distributed, so that some people have to try again much more often than others (i.e. not all faces are equally recognisable).

Of course, these performance numbers are, if not in ideal conditions, then in reasonable conditions, whereas real images tend to be much more variable (weather, dust on the camera lens, shake on the mounting,…). And, of course, in real systems you can’t zoom in and miraculously produce more pixels as they seem to be able to do on tv. So there’s quite a long way to go. I think it’s fair to say that progress is being made — but facial recognition is a long way from production use.

Automating microexpression detection

One of the weaknesses of microexpressions as a way to detect deception is that it requires substantial training to learn to recognize them, and even then might need high speed video and several playbacks (think the TV show Lie to Me). It is no surprise, therefore, that there has been a lot of work trying to recognize microexpressions automatically.

At the recent European Conference on Intelligence and Security Informatics, Senya Polikovsky, in his talk, claimed that his group had been able to do it (although this isn’t in the paper:  Polikovsky,  Quiros-Ramirez, Kameda, Ohta and Burgoon, Benchmark Driven Framework for Development of Emotion Sensing Support Systems which should be online quite soon). Admittedly, this is in a framework with a specific seat, specific lighting, and specific cameras and sensors, but this would still represent a significant advance. I got a brief chance to talk to him afterwards, and the approach he described seemed plausible, clever, and non-obvious.

Super Identities

I heard a talk on the UK Super Identity Project last week which stimulated some musings on this important topic.

Once upon a time, almost everyone lived in villages, and identity was not an issue — everyone you knew also knew you and many of them had done so since you were born. So identity issues hardly arose, apart from an occasional baby substitution (but note Solomon in 1 Kings 3:16-28 for an early identity issue). As rich people began to travel, new forms of identity evidence such as passports and letters of introduction were developed.

About a hundred years ago and as the result of mass movement to cities, questions of identity become common. You can see from the detective stories of the time how easy it was to assume another identity, and how difficult it was to verify one, much as it is in cyberspace today. To deal with these issues, governments become involved as the primary definers of identity, getting in on the act with birth certificates (before that, e.g. baptismal records), and then providing a continuous record throughout life.

In parallel, there’s the development of biometric identifiers, mostly to deal with law enforcement, first the Bertillon system and then fingerprints (although as I’ve noted here before, one of the first of the detective stories to include fingerprints– The Red Thumb Mark — is about how easy they are to forge).

The Super Identity project is trying to fuse a set of weak identifiers into a single identity with some reliability. Identities are important for three main reasons (a) trust, for example so that I can assume that someone I’m interacting with online is the person I think it is; (b) monetizing, for example so that an advertiser can be sure that the customized ad is being sent to the right person; and (c) law enforcement and intelligence, for example, these identities are actually the same underlying person.

There are many identifying aspects, almost all of which are bound to a particular individual in a weak way. They come in four main categories:

  1. Physical identifiers such as an address, or a place of employment.
  2. Biometrics (really a subset of the physical) such as fingerprints, iris patterns, voice and so on. These at first glance seem to be rather strongly bound to individuals, but all is not as it appears and they can often be forged in practice, if not in theory. There is an important subset of biometrics that are often forgotten, those that arise from subconscious processes; these include language use, and certain kinds of tics and habits. They are, in many ways, more reliable than more physical biometrics because they tend to be hidden from us, and so are harder to control.
  3. Online identifiers such as email addresses, social network presence, web pages, which are directly connected to individuals. Equally important are the indirect online identifiers that appear as an (often invisible) side-effect of online activity such as location.
  4. Identifiers associated with accessing the online world, that is identifiers associated with bridging from the real world to the online world. These include (beloved by governments despite their weakness) IP addresses which led to a recent police raid, complete with stun grenades, on an innocent house.

The problem with trying to fuse these weak identifying aspects into a single superidentity which can be robustly associated with an individual is this: it’s relatively difficult to avoid creating these identifying aspects, but it’s relatively easy to create more identifying aspects that can be used either to actively mislead or passively confuse the creation of the superidentity.

For example, there’s been some success in matching userids from different settings (gmail, facebook, flickr) and attributing them to the same person. But surely this can only work as long as that person makes no effort to prevent it. If I want to make it hard to match up my different forms of web presence then I can choose userids that don’t associate in a natural way — but I can also create extra bogus accounts that make the matching process much harder just from a computational point of view.

So it may be possible to create a cloud of identifying aspects, but it seems much more difficult to find the real person within that cloud, especially if they’re trying to make themselves hard to find. The Super Identity project would no doubt respond that most people aren’t about making themselves harder to identify. I doubt this; I think we’re moving to a world where obfuscation is going to be the only way to gain some privacy — a world in which the only way to dissociate ourselves from something we don’t want made public is to make the connection sufficiently doubtful that it cannot reliably acted on. This might be called self-spamming.

For example, if a business decides to offer differential pricing to certain kinds of customers (which has already happened), then I want to be able to dissociate myself from the category that gets offered the higher price if I possibly can. If the business has too good a model of my identity, I may not be able to prevent them treating me the way they want to rather than the way I want them to. (This is, of course, why almost all data mining is, in the end, going to be adversarial.)

In the end, behavior is the best signal of identity because it’s hard for us to modify, partly because we don’t have conscious awareness of much of it, and partly because we don’t have conscious control even when we have awareness. No wonder behavior modelling is becoming a hot topic, particularly in the adversarial domain.

How do I demonstrate that I am me?

The question of identity, how the question in the title gets answered, is one with an interesting history; and one that is changing again at the moment.

For much of human history, identity was almost completely determined by the fact that a person was born and grew up in a community where everyone knew them, and never moved far from this community. This is still true in many parts of the world, but was surprisingly true in the developed world until quite recently.

Things changed when migration to cities started in a big way, in Western countries perhaps around the 16th century and accelerating since then. Someone who moved to a city could become anyone they wanted as long as they kept away from people from the same general area as they were, who might know them or know of them. This was harder than it seemed, mostly because of the tendency of people with the same origin to live contiguously when they arrived in a city (so if you were from X but didn’t live in the X area, you automatically attracted attention). This ability to assume new identities was grist to the mill of detective stories up to about 100 years ago (notably Austin Freeman).

In the last 100 years, governments have become the guarantors of identity because of the requirement to collect taxes, mostly income taxes; and, for an increasing number of people, because of the need to cross borders. So governments issue identity documents that are tied to a single person via some kind of link, perhaps a biometric or even an address. And, for most people, this is where things stand now.

But there are new forms of identity beginning to be created, and new ways to blur identities as well.

I have had a web page with my photo on it, and links to my papers, and so on, since the web began. Copies of this web page have been periodically archived, at moments that I can’t control, by the Wayback Engine and probably several other places as well. If I want to prove my identity, I can now do it without any government intervention by pointing to these copies of my web page which have information that links them unqiuely to me. For many people, their Facebook or LinkedIn profile pages would do the same thing if they were publicly archivable. So identity is once again moving away from something that is government mediated to something that is more decentralized and community based.

On the other side of the coin, governments and others are actively creating artificial personas, sometimes called sock puppets. These personas are controlled by a real person, but one person can control many of them, and the postings of each persona don’t need to be the ones that the controller would naturally make. In other words if, on the internet, nobody can tell you’re a dog, it follows that nobody can tell you’re not a construct either.

In order to make these sock puppets realistic, a back story has to be created for each one; increasingly, this means that they have to have a created trail in places where this might be looked for. Once upon a time, intelligence organizations would go into official records and create entries for non-existent people; this is inherently difficult, especially in records that are owned by other governments (remember, governments validated identities); so often identities of people who had died were used as starting points. I expect we’ll see that same thing happening in the online world.

But there’s an important difference: while governments can go back and change history embodied in records, neither they nor anyone else can change the history embedded in web sites that, at random times, take a snapshot of some part of the web. So creating realistic sock puppets is actually really difficult.

There’s also the issue of language: one controller runnning multiple sock puppets cannot avoid using detectably similar language patterns for all of them; and eventually this will make it possible to detect artificial personas.

Thoughts on the Australian Government White Paper on Counter-terrorism

The Australian government has just released a White Paper updating their policy on counterterrorism. Most of the content is eminently sensible, but there are a couple of questionable assumptions and/or directions.

1.  The section on resilience assumes that radicalisation can be mitigated by “reducing disadvantage” using government actions to address social and economic issues. This may well be so, but I don’t think there’s much evidence to support it. It’s clear that there are countries where economic and social grievances are significant drivers for radicalisation (e.g. Southern Thailand); but the results of a recent survey in Canada with which I was involved showed clearly that attitudes about economic and social issues were uncorrelated to radicalism. Although many Islamic immigrants to Canada (and indeed many immigrants) struggle with e.g. access to jobs, this does not seem to turn into a sense of grievance that might lead to radicalisation. Australia may be different, but there doesn’t seem to be any particular reason why it should be.

2. The section on intelligence-led counterterrorism talks about three components: the ability to collect; the ability to analyse; and the ability to share. There is existing capacity and proposed actions for the first and the the third — but there is a great black hole in both existing capacity and proposed action for the second: analysis.

It’s easy to skip over this word and assume what it means; but I suspect that, when it’s unpacked, it tends to be taken to mean either “looking stuff up” or “having a human put stuff together to discover its significance”. It doesn’t take much thought to realise that this can’t be enough. The challenge in intelligence is (a) deciding how important each dot is, and (b) finding the interesting constellations of dots from among the many possible constellations. In practice, the number of dots is in the thousands (and up) each day, so this process must be largely automated.

There is a strange blind spot about the role and importance of analysis. I suspect that this is mostly because it’s not obvious how powerful inductive data modelling can be and it’s not on the conceptual map of most people, especially those whose training has been in the humanities and social sciences. But talking about collection and sharing without talking about analysis is like a sandwich without the filling — and you don’t make a better sandwich by improving the quality of the bread, if there’s still no filling.

Analysis is tough for intelligence agencies, who are fighting a battle to upgrade their capabilities at the same time as meeting the real-time challenges of what analysis they can already do. And, although data mining/knowledge discovery is a well-developed subject, adversarial data mining, which I’ve often argued here is quite a different subject, has received little attention. One way that governments can help is to let some of this upgrading happen in universities. As far as I am aware, there is almost no work on counterterrorism analysis happening in Australian universities, and the possibility gets only a tiny mention in the National Security Science and Innovation Strategy. There are several research groups looking at the social aspects of terrorism and counterterrorism, and one or two looking at the forensic aspects of data analysis, but a conspicuous absence of work on data analysis as a preventive and preemptive tool.

A part of the report that has attracted media attention is the intent to impose special visa requirements for applicants from 10 as-yet-unidentified countries (but the US imposed special requirements on 10 countries so it probably isn’t too hard to guess the list). Two parts of this are problematic. First, it will use new biometrics — although this seems to be a grand way of talking about fingerprints and facial photos. Biometrics get over-trusted; they are mostly relatively easy to spoof. Second, the report promises to use “advanced data analysis and risk profiling” to identify risky visa applicants. It’s hard to know what to make of this,  but it sounds like either something quite weak, or something with unworkably high false-positive and false-negative rates.

3.  The problem with treating home-grown terrorism as a law enforcement problem is that catching and sentencing those who have planned or carried out attacks doesn’t do anything for those who are “next in line”. There’s a risk that dealing with a home-grown group simply radicalises their supporters to the point of violence. For example, this seems to be a potential risk after the sentencing of five men last week.

Other countries, for example Thailand and Saudi Arabia (although with questionable success), take a wider view and try to deradicalise those whose involvement with terrorist activity is marginal. In other words, any criminal events in the terrorism area are regarded as the tip of an iceberg; and other approaches (sometimes called “smart power”) are used to address the less-visible hinterland of the criminal event. While a law enforcement approach is good, there seems to be some scope for a wider approach to the problem. And the great majority of home-grown attacks have been discovered and prevented because of the actions of a whistle-blower within the attackers’ community, so motivating such whistle-blowing and making it easy seems like it should be a centrepiece of any proposed strategy.

More on Identity

I’ve mentioned the problem of figuring out when data records describe the same person in the two most recent posts. Casinos are required to ban certain people who have self-identified themselves as having a gambling problem, so they have to look carefully at everyone who books a room. They also, of course, have an interest in noticing when certain other people show up, for example card counters.

As I said yesterday, identity is a slippery thing to manage algorithmically. It’s only in the last century that governments have gotten into the act of certifying identity, via various forms of government-issued identification, going back to birth certificates.

Such documents are not necessarily very reliable. There’s a long history of forging them. But mostly identity gets fudged because people don’t use them directly — they copy names and addresses with characteristic human errors; and this process can be helped along by those who want to hide their identity. It’s socially acceptable to use variant names, and people constantly make mistakes with numbers. Those who want to can use these deniable mistakes to create multiple versions of their identities.

This is partly why there’s such an interest in biometrics. A biometric is an identity key that was given to you by God. The important distinction in biometrics is between a digital biometric and a non-digital one. A photo in a passport is a non-digital biometric — it can be used to associate the passport, and so its contents, with you, but doesn’t do much else. A digital biometric, such as a digitized photo, can act as a key to a large database of information about you.

Most biometrics are extremely easy to fool. You can read about some of the easy tricks here. Fingerprint scanners can be fooled by plastic wrap; iris scanners by printed photos of an iris.

In relationship/graph data, the problem with multiple records describing the same person is that they blur the structure of the connections around that person — making some paths seem longer, and some properties more diffuse. That’s why it’s important to be able to resolve identities when possible; but also why it’s important to stay agnostic over the long haul.