Posts Tagged 'secret sharing'

Secrets and authentication: lessons from the Yahoo hack

Authentication (I’m allowed to access something or do something) is based on some kind of secret. The standard framing of this is that there are three kinds of secrets:

  1. Something I have (like a device that generates 1-time keys)
  2. Something I am (like a voiceprint or a fingerprint), or
  3. Something I know (like a password).

There are problems with the first two mechanisms. Having something (a front door key) is the way we authenticate getting into our houses and offices, but it doesn’t transfer well to the digital space. Being something looks like it works better but suffers from the problem that, if the secret becomes widely known, there’s often no way to change the something (“we’ll be operating on your vocal cords to change your voice, Mr. Smith”). Which is why passwords tend to be the default authentication mechanism.

At first glance, passwords look pretty good. I have a secret, the password, and the system I’m authenticating with has another secret, the encrypted version of the password. Unfortunately, the system’s secret isn’t very secret because the encrypted version of my password is almost always transmitted in clear because of the prevalence of wifi. Getting from the system’s secret to mine is hard, which is supposed to prevent reverse engineering my secret from the system’s.

The problem is that the space of possible passwords is small enough that the easy mapping, from my secret to the system’s, can be tried for all strings of reasonable length. So brute force enables the reverse engineering that was supposed to be hard. Making passwords longer and more random helps, but only at the margin.

We could instead make the secret a function instead of a string. As the very simplest example, the system could present me with a few small integers, and my authentication would be based on knowing that I’m supposed to add the first two and subtract the third. My response to the system is the resulting value. Your secret might be to add the first and the third and ignore the second.

But limitations on what humans can compute on the fly means that the space of functions can’t actually be very large, so this doesn’t lead to a practical solution.

Some progress can be made by insisting that both I and the system must have different secrets. Then a hack of either the system or of me by phishing isn’t enough to gain access to the system. There are a huge number of secret sharing schemes of varying complexity. But for the simplest example, my secret is a binary string of length n, and the system’s secret is another binary string of length n. We exchange encrypted versions of our strings, and the system authenticates me if the exclusive-or of its string and mine has a particular pattern. Usefully, I can also find out if the system is genuine by carrying out my own check. This particular pattern is (sort of) a third secret, but one that neither of us have to communicate and so is easier to protect.

This system can be broken, but it requires a brute force attack on the encrypted version of my secret, the encrypted version of the system’s secret, and then working out what function is applied to merge the two secrets (xor here, but it could be something much more complex). And that still doesn’t get access to the third secret.

Passwords are the dinosaurs of the internet age; secret sharing is a reasonable approach for the short to medium term, but (as I’ve argued here before) computing in compromised environments is still the best hope for the longer term.

Computing in Compromised Environments

As I’ve argued before, the Castle Model of cybersecurity is pretty much doomed — there’s no harm in antivirus and antimalware tools, but they provide only modest defence in a world where adversaries have access to the source code of the systems and tools that we run. Nobody, even at the high end, can assume that their systems haven’t been infiltrated by adversaries.

So if it’s impossible to keep the Vikings from roaming the hallways of the castle looking for things to steal, can anything be done to allow useful work to get done and at the same time protect against issues such as theft of intellectual property? The answer is yes, but it requires a change of mindset.

First, most things that can be stolen from the online world are not like pots of gold or the secret formula for antigravity — things for which existence is the fundamental property. Rather, most things that can be stolen are about choices from alternatives: will the tender bid be for this many dollars or that many dollars? Is the system going to use this technique or that technique? Is the software code going to be like this or like that? In other words, the property can be protected by adding uncertainty — if something is stolen but it may or may not be the true thing, then the stealing is much less rewarding, and might be useless.

As a concrete example, suppose the CEO is recommending to the Board that the business move in direction A, and this information is contained in a briefing note online. If there is also a briefing note recommending a move in direction B, and one recommending direction C and it’s not possible to tell which is the true one, then the theft of any or all of them provides adversaries with little information.

So the heart of the idea is to replicate information so that the true information is hidden in a welter of similar information that is interestingly different.

Making this idea work requires a couple of technical pieces which are buildable. For simplicity, I’ll describe the system in the case where there are only two copies of each document, but everything extends straightforwardly to as many replicas as you want, so that the uncertainty can be made arbitrarily large.

The first part is to defeat the possibility of working out which are the real documents by traffic and behavioral analysis. The ‘trick’ here is to use the ideas developed for the Frankenstein malware — create the fake documents from pieces of real documents, and create the fake editing actions by pasting together real editing actions. In other words, whenever a human carries out a sequence of edits, the actions and their timing are captured and replayed against fake documents. Thus even an observer with access to the complete system from the inside cannot distinguish between a live human working on a document and a piece of software doing the same (not, at least, without keystroke loggers, and even that can be worked around).

There are some obvious special cases: it helps to insert or remove ‘not’ around verbs; and it helps to change numbers in arbitrary ways. The point is not that the fakes should look plausible to careful analysis — it’s that they shouldn’t be detectable as fakes using automated analysis. Note that many real documents exist in unpolished and perhaps contradictory states as they are developed as well.

So the basic mechanism is that humans work on the real documents but software simulates humans working on the fake documents. Of course, the humans should be encouraged to work on the fake documents occasionally too.

The second part, then, is how the humans know which documents are the real ones in such a way that someone lurking inside the system can’t. Let’s suppose that each file exists with two name variants: fnA and fnB, one of which is real and other fake. To let the humans keep track of which is real, we need one offline secret. Each user is given an integer which is their part of the secret. Each time they log on, the system sends them another random integer (which is chosen from a fixed range, large enough that it is difficult for adversaries to infer what the range might be). If this random number is greater than the user’s number, then version A is the real one, if it is smaller then version B is. (This is a very simple version of Shamir’s secret sharing scheme, and all of the more sophisticated versions, including updating regimes can be slotted in here.)

A user cannot infer any other user’s offline secret; nor the range of the random numbers (although an adversary can know this since they can steal the code that implements it); and knowing someone else’s offline secret adds nothing. Each user’s offline secret can be changed at any time, even without any online consultation if the range of random numbers is allowed to be known in the offline world. The system itself can permute file names or make the apparent file names user-dependent with a few tweaks of the way in which numbers are generated. More complex secret-sharing can require more than one user to share their offline secrets to enable access to the true versions of particular files.

This looks, at first glance, like a lot of work. But the costs of our current security schemes are non-trivial, and both cycles and storage are relatively cheap. This scheme even makes it possible to use clouds again, something that has been pretty effectively torpedoed by the revelations of the level of interception in Five Eyes countries in the past week.