Computing in Compromised Environments

As I’ve argued before, the Castle Model of cybersecurity is pretty much doomed — there’s no harm in antivirus and antimalware tools, but they provide only modest defence in a world where adversaries have access to the source code of the systems and tools that we run. Nobody, even at the high end, can assume that their systems haven’t been infiltrated by adversaries.

So if it’s impossible to keep the Vikings from roaming the hallways of the castle looking for things to steal, can anything be done to allow useful work to get done and at the same time protect against issues such as theft of intellectual property? The answer is yes, but it requires a change of mindset.

First, most things that can be stolen from the online world are not like pots of gold or the secret formula for antigravity — things for which existence is the fundamental property. Rather, most things that can be stolen are about choices from alternatives: will the tender bid be for this many dollars or that many dollars? Is the system going to use this technique or that technique? Is the software code going to be like this or like that? In other words, the property can be protected by adding uncertainty — if something is stolen but it may or may not be the true thing, then the stealing is much less rewarding, and might be useless.

As a concrete example, suppose the CEO is recommending to the Board that the business move in direction A, and this information is contained in a briefing note online. If there is also a briefing note recommending a move in direction B, and one recommending direction C and it’s not possible to tell which is the true one, then the theft of any or all of them provides adversaries with little information.

So the heart of the idea is to replicate information so that the true information is hidden in a welter of similar information that is interestingly different.

Making this idea work requires a couple of technical pieces which are buildable. For simplicity, I’ll describe the system in the case where there are only two copies of each document, but everything extends straightforwardly to as many replicas as you want, so that the uncertainty can be made arbitrarily large.

The first part is to defeat the possibility of working out which are the real documents by traffic and behavioral analysis. The ‘trick’ here is to use the ideas developed for the Frankenstein malware — create the fake documents from pieces of real documents, and create the fake editing actions by pasting together real editing actions. In other words, whenever a human carries out a sequence of edits, the actions and their timing are captured and replayed against fake documents. Thus even an observer with access to the complete system from the inside cannot distinguish between a live human working on a document and a piece of software doing the same (not, at least, without keystroke loggers, and even that can be worked around).

There are some obvious special cases: it helps to insert or remove ‘not’ around verbs; and it helps to change numbers in arbitrary ways. The point is not that the fakes should look plausible to careful analysis — it’s that they shouldn’t be detectable as fakes using automated analysis. Note that many real documents exist in unpolished and perhaps contradictory states as they are developed as well.

So the basic mechanism is that humans work on the real documents but software simulates humans working on the fake documents. Of course, the humans should be encouraged to work on the fake documents occasionally too.

The second part, then, is how the humans know which documents are the real ones in such a way that someone lurking inside the system can’t. Let’s suppose that each file exists with two name variants: fnA and fnB, one of which is real and other fake. To let the humans keep track of which is real, we need one offline secret. Each user is given an integer which is their part of the secret. Each time they log on, the system sends them another random integer (which is chosen from a fixed range, large enough that it is difficult for adversaries to infer what the range might be). If this random number is greater than the user’s number, then version A is the real one, if it is smaller then version B is. (This is a very simple version of Shamir’s secret sharing scheme, and all of the more sophisticated versions, including updating regimes can be slotted in here.)

A user cannot infer any other user’s offline secret; nor the range of the random numbers (although an adversary can know this since they can steal the code that implements it); and knowing someone else’s offline secret adds nothing. Each user’s offline secret can be changed at any time, even without any online consultation if the range of random numbers is allowed to be known in the offline world. The system itself can permute file names or make the apparent file names user-dependent with a few tweaks of the way in which numbers are generated. More complex secret-sharing can require more than one user to share their offline secrets to enable access to the true versions of particular files.

This looks, at first glance, like a lot of work. But the costs of our current security schemes are non-trivial, and both cycles and storage are relatively cheap. This scheme even makes it possible to use clouds again, something that has been pretty effectively torpedoed by the revelations of the level of interception in Five Eyes countries in the past week.


0 Responses to “Computing in Compromised Environments”

  1. Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: