Low Hanging Fruit in Cybersecurity III

Any attempt to decide whether a particular action is “bad” or “good” requires some model of what “good” actually means. The only basis for intelligent action in almost any setting is to be able to have a plan for the expected, but also a mechanism for noticing the unexpected — to which some kind of meta-planning can be attached. This is, of course, a crucial part of how we function as humans; we don’t hang as software often does, because if we encounter the unexpected, we do something about it. (Indeed, an argument along this line has been used by J.R. Lucas to argue that the human mind is not a Turing machine.)

But most cybersecurity applications do not try (much) to build a model of what “good” or “expected” or “normal” should be like. Granted, this can be difficult; but I can’t help but think that often it’s not as difficult as it looks at first. Partly this is because of the statistical distribution that I discussed in my last post — although, on the internet, lots of things could happen, most of them are extremely unlikely. It may be too draconian to disallow them, but it seems right to be suspicious of them.

Actually, three different kinds of models of what should happen are needed. These are:

  1. A model of what “normal” input should look like. For example, for an intrusion detection system, this might be IP addresses and port numbers; for a user-behavioral system, this might be executables and times of day.
  2. A  model of what “normal” transformations look like. Inputs arriving in the system lead to consequent actions. There should be a model of how these downstream actions depend on the system inputs.
  3. A model of what “normal” rates of change look like. For example, I may go to a web site in a domain I’ve never visited before; but over the course of different time periods (minutes, hours, days) the rate at which I encounter brand new web sites exhibits characteristic patterns.

An exception to the first model shows that something new is happening in the “outside” world — it’s a signal of novelty. An exception to the second model shows that the system’s model of activity is not rich enough — it’s a signal of interestingness. An exception to the third model shows that the environment is changing.

Activity that does not fit with any one of these models should not necessarily cause the actions to be refused or to sound alarms — but it does provide a hook to which a meta-level of analysis can be attached, using more sophisticated models with new possibilities that are practical only because they don’t get invoked very often.

Again think of the human analogy. We spent a great deal of our time running on autopilot/habit. This saves us cognitive effort for things that don’t need much. But, when anything unusual happens, we can quickly snap into a new mode where we can make different kinds of decisions as needed. This isn’t a single two-level hierarchy — in driving, for example, we typically have quite a sophisticated set of layers of attention, and move quickly to more attentive states as conditions require.

Cybersecurity systems would, it seems to me, work much more effectively if they used the combination of models of expected/normal behavior, organized in hierarchies, as their building blocks.

Low Hanging Fruit in Cybersecurity II

If cybersecurity exists to stop bad things happening in computing systems, then it seems to me that there are several implicit assumptions that underlie many approaches and techniques that might not be completely helpful. These are:

  • The distinction between “good” (or “allowable”) and “bad” is a binary distinction;
  • The decision about this distinction has to be made monolithically in a single step;
  • The distribution of likely things that could happen is uniform (flat).

Even to write them explicitly shows that they can’t quite be right, but nevertheless I suspect they exist, unexamined, in the design of many security systems.

What happens if we remove these assumptions?

If the distinction between “good” and “bad” is not discrete, then our systems instead allocate some kind of continuous risk or suspicion to actions. This creates an interesting new possibility — the decision about what to do about an action can now be decoupled from how the action is categorized. This is not even a possibility if the only distinction we recognize is binary.

From a purely technical point of view, this means that many different kinds of risk measuring algorithms can be developed and used orthogonally to decisions about what the outputs of these algorithms means. Critical boundaries can be determined after the set of risks has been calculated, and may even be derived from the distribution of such risks. For example, bad things are (almost always) rare, so a list of actions ordered by risk will normally have a bulge of “normal” actions and then a small number of anomalous actions. The boundary could be placed at the edge of the bulge.

Second, what if the decision about whether to allow an action doesn’t have to be made all at once. Then systems can have defence in depth. The first, outer, layer can decide on the risk of a new action and decide whether or not to allow it. But it can be forgiving of potential risky actions if there are further layers of categorization and defence to follow. What it can do is to disallow the clearly and definitively bad things, reducing the number of potentially bad things that have to be considered at later stages.

From a technical point of view, this means that weaker but cheaper algorithms can be used on the front lines of defence, with more effective but more expensive algorithms available for later stages (where they work with less data, and so do not cost as much overall, despite being more expensive per instance).

Third, what if our defence took into account that the landscape of expected actions is not uniform, so that low probability events should automatically be treated as more suspicious. For example, spam filtering does lots of clever things, but it doesn’t build a model of the sources of my email, and flag emails from countries that I’ve never, ever received email from as inherently more likely to be spam. (Yes, I know that sender addresses can be spoofed.)

This idea has been used in behavioral profiling of computer activity, and it sort of works. But it needs to be combined with the ideas above, so that actions can be rated along a continuum from: routine (allow), to unusual but still not that unusual (allow, but maybe with a user question or at least logged for occasional inspection), to very unusual (user better explicitly allow), to bizarre (disallow). Windows has a weak version of this, which hasn’t been accepted well by users, but it flags only one thing (program start) and it doesn’t build a model of typical behavior by each user.

For example, the set of IP addresses with which my computer interacts is quite large, and hard to represent by some kind of convex structure, so intrusion detection doesn’t work very well if it depends on wrapping/categorising those IP addresses that are OK, and blocking traffic from those that are not. And usually the set of OK IP addresses is not derived from those I interact with, but encoded in some set of rules that apply to many computers. But if instead I built a model of the IP addresses I interact with, allowing older ones to get stale and disappear, and then looked at new IP addresses and allowed them if they resembled (tricky) those I already interact with, and asked me about the others, then this might work better than current approaches. An IP address is a hierarchical structure, with a possible country followed by the top octet, and so on, so I can discriminate quite finely about what it might mean. Even a web server that is theoretically visible to every other IP address could still benefit from handling unlikely source IP addresses differently.

OK, maybe this isn’t exactly low hanging fruit, but the ideas are straightforward and (IMHO) should be built into the design of more robust systems.

Low Hanging Fruit in Cybersecurity

There’s a rapidly increasing interest in cybersecurity, partly because the penny is finally dropping about the financial impact on government and business, and not just individuals.

I don’t work directly in this area, but it is another adversarial domain, so some of the problems and approaches I think about have some applicability.

But cybersecurity does seem to be an area where some Pareto (80:20) thinking might not go amiss.

For example, botnets use machines that have been compromised but whose owners/users don’t realise this. In my experience there are two reasons for this:

  1. The owners don’t realise that things are not right with their machine and put anomalous network and disk traffic down to the general weirdness of computers;
  2. The machines do not really have an owner — they drive other bits of hardware, or they are shared among many people who each use them for a short time.

One thing that would really help would be for someone knowledgeable (not e.g. Lifehacker — although that would be a start — but someone from the cybersecurity community) would post a complete guide to hardening the major categories of PCs. Most people know that they should be running a spam filter and antivirus software,  but from comments at a recent conference it seems that many people are still seeing spam, which suggests to me that they haven’t even got this simple part right. But there are many other tools that, if run on almost all machines, would cripple the ability to take them over. For example, I’ve been very happy with Iobit’s Security 360 which routinely finds malware on my machine behind a university “security system”. But I don’t use a VPN when I travel and I probably should. And what else should I be doing that I don’t even know is possible?

Volunteers anyone? (Of course, such a thing may exist, but I haven’t been able to find it.)

Finding significance automatically

In a world where lots of data is collected and available, the critical issue for intelligence, law enforcement, fraud, and cybersecurity analysts is attention.

So the critical issue for tools to support such analysts is focus: how can the most significant and interesting pieces of data/information/knowledge be made the easiest to pay attention to?

This isn’t an easy issue to address for many reasons, some of which I talked about a few posts ago in the context of connecting the dots. But the fundamental problems are: (1) significance or interestingness are highly context dependent, so where to focus depends, in a complex way, on what the analyst already knows and understands; and (2) every new piece of information has the potential to completely alter the entire significance landscape in one hit.

Many existing tools are trying, underneath, to address the issue of focus indirectly, by providing ways for analysts to control their own focus more directly. For example, there are many analysts platforms that allow available information to be sliced and diced in many different ways. These allow two useful things to be done: (1) dross (the guaranteed insignificant stuff) can be removed (or at least hidden from sight); and the rest of the data can be rearranged in many different ways in the hope that human pattern-recognition skills can be brought to bear to find significance.

But it seems like a good idea to try and address the significance issue more directly. This has motivated a couple of the research projects I’m involved with:

  • The ATHENS system tries to find information on the web that is probably new to the user, but which s/he is well-positioned to understand; in other words, the new information is just over the horizon from the user’s current landscape. It builds this new information starting from a query that allows the user to provide the current context;
  • Finding anomalies in large graphs. Lots of data is naturally represented relationally as a graph, with nodes representing some kind of entities, and edges representing some kind of (weighted) similarity between some subset of the nodes (e.g. social networks). Graphs are difficult to work with because they don’t really have a representation that humans can work with — even drawing nice pictures of them tends to (a) occlude chunks once the graph gets big enough, and (b) hide the anomalous structure in the corners because the nice representation is derived from the big structure (e.g. the simple bits of the automorphism group). We’ve developed some tools that find and highlight anomalous regions, anomalous in the sense that, if you were to stand at their nodes and look at the landscape of the rest of the graph, it would look unusual.
  • Finding anomalies in text caused by either a desire to obfuscate the content that’s being talked about, or caused by internal mental state that’s unusual — being deceptive, or highly tense, for example.

Some other people are working in similar directions. For example, there is some work aimed at using social processes to help discover significance. In a sense, sites like Slashdot work this way — each user provides some assessment of quality/importance of some stories, and in return gets information about the quality/importance of other stories. This is also, of course, how refereed publications are supposed to work. The challenge is to contextualize this idea: what makes an object high quality/important for you may not mean anything to me. In other words, most significance lies somewhere on the spectrum from universal agreement to completely taste-based, and it’s hard to tell where, let alone compute it in a practical way.

Deception scores for the UEA emails

I’ve also calculated the deception scores for the UEA “climategate” emails, using the same methodology that I’ve written about in the context of the speeches of presidential candidates.

This doesn’t (yet) give any great results. This is partly because deception scores can only be computed for sets of similar documents. The UEA emails, however, fall into two broad classes: simple emails, and discussions and suggestions about more formal documents (papers and grant proposals). The language in these two classes is quite different, which makes them difficult to compare. For example, the base rates of first-person singular pronouns are very different.

What I have done is to see whether there are any patterns in  deception scores with time. A strong change in either class of email should be detectable as a variation of score with time, which might be visible. The result is shown below, with the deception score axis running from right (low) to left (high), and the markers getting lighter with the passage of time.

Deception scores of UEA emails

The only thing that strikes me so far is that many emails with low deception scores are older in time. This might be taken to indicate some kind of change in the language patterns of these email users.

The released emails are a small and not very random set of all of the emails sent by these individuals. So not too much should be read into this plot.

Patterns of word usage in the UEA climate emails

I’m always pleased to see examples of real emails because they can act as testbeds for various textual analysis techniques. I’ve begun to analyse the “climategate” emails from the University of East Anglia. The figure below shows a plot of the structure of the words used. (This is quite a quick and dirty analysis — I didn’t try to remove email headers or otherwise clean up the content of the files.)

There are three parts to the structure. The arm to the right is an artifact of the fact that several word files were included in the bodies of emails, rather than as attachments, so my extraction software sees them as part of the text. This can be fixed, but will take me some time.

The interesting property is the longtitudinal structure from top to bottom in the figure. The phrases at the bottom are all content, while the phrases at the top are all identifiers of people and places (admittedly hard to see). Since the analysis algorithms know nothing of the semantics of emails, and are based purely on “bag of words” style analysis, this is an interesting, and unexpected, outcome.

Content in election speeches

I went back and looked at the collection of speeches by the three contenders in the 2008 U.S.presidential election, not from the point of view of spin, but just looking at what they were about. The results were a little surprising. Despite all of the thinking and polling that goes into a campaign, the speeches were about only three topics.

Structure of word usage in speeches

These three topics are: the economy, associated with words such as crisis, jobs, plan, pay; security, associated with words such as war, threat, Iraq, allies; and family, associated with words such as parents, children, living. This last group features only early in the campaign and a single speech by Obama given on Fathers Day.

It’s no surprise that these topics appear. What does seem surprising is how little else was mentioned: other topics words are all hidden in the center of the figure indicating that their pattern of appearance in not very interesting.

Questionoids as well as factoids

In the previous post I talked about the problem of “connecting the dots” and how this innocuous-sounding phrase conceals problems that we don’t yet know how to solve — some we don’t even know how to attack.

There’s another side to the story, and that’s the questions that are applied to the collection of factoids. These are important for two reasons.

1.  Asking a question for which a particular factoid is the answer should perhaps have some impact on the importance/interestingness of that factoid. This isn’t a magic bullet (because unknown unknowns might also be important, but won’t be looked for). But it’s a start.

(Google presumably is using some variant of this idea to weight the importance of web pages since Pagerank is based on explicitly created links as indicators of importance, but few of us create explicit links any more because it’s easier to go via Google — so some other indicators must surely come into play. But I haven’t seen anything public about this.)

2.  Asking a question for which there is no matching factoid does not mean that the question should be discarded (as it is in e.g. database systems). Rather such unanswered questions should become data themselves (and so should the answered ones). New factoids should be considered against the aggregrate of these questions to see if they match — in other words, all queries should be persistent. That way, if someone asks about X and information about X is not known, the appearance of a factoid about X should cause a response to be generated, long after the analyst originally posed the question. Even if there was a factoid about X and so a response to the query, a new factoid about X will automatically generate a supplementary response.

In this view, which was first and most clearly enunciated by Jeff Jonas as part of the NORA and EAS systems, there are two forms of data: factoids and questionoids. Pairs of one of each “match” and cause a response to the outside world. But both kinds are worthy of meta-analysis, and the results of this analysis can be used to change the way the opposite kind of data is weighted.

The question data is also interesting in its own right. For example, an analyst may be interested to know that someone else has asked the same question as they did, even if it was asked months ago.

Connecting the dots is hard

In the aftermath of the Christmas Day attempted in-flight bombing, the issue of whether intelligence agencies should have been able to “connect the dots” beforehand has once again been heavily discussed.

Putting together disparate pieces of information to discover a pattern of interest is much more difficult than it looks, but the reason is subtle.

First, and obviously, it’s much easier to find some pieces of information that fit into a pattern when you already know the pattern — so connecting the dots after an incident always looks easy. This is just another way of saying the hindsight is 20/20.

What about the problem of putting together the pieces before an incident? Let’s suppose, for simplicity, that the pieces of information are simple factoids: this person did something, said something, bought something that might have been potentially suspicious/interesting. We could give each of these factoids a weight indicating how important we suppose it to be, perhaps based on its inherent unusualness or connection to a perceived risk; and perhaps also based on the reliability associated with it.

Even this simple first step is not straightforward because the weight depends to some extent on some perception of perceived modes of attack: buying the chemical components of an explosive seems like it should be high weight; but many other actions have ambiguous possible weights depending on what we (implicitly beforehand) think is plausible or likely. Buying Jello might indicate an interest in growing bacteria, or just a taste for a cheap dessert.

And if assigning weights to individual factoids is difficult, the difficulty is compounded by the sheer number of such factoids that exist. I don’t have any hard information, but from public statements we could estimate that perhaps 10,000 potential terrorists are being tracked around the world; on any given day, the number of factoids generated by their actions, communications, and web traffic could easily be hundreds of time greater.

So, it’s no surprise that individual factoids get underweighted when they first enter intelligence systems. The net effect of the failure to detect the recent attack is that all factoids will be given more weight, which relatively has no effect at all (except to keep already overstretched intelligence officers busier).

But this is the easy part. The connecting of these factoid dots is much, much harder.

First, the existence of a connection between two factoids can change (perhaps dramatically) the weight associated with both of them. So, theoretically at least, the potential association between each pair of factoids should be explored. In complexity terms, the number of comparisons is quadratic in the number of factoids: if there are 100 factoids, the  there are 100×100 possible connections. Calculations that have complexity quadratic in the size of their inputs are just on the boundary of the practically doable — possible for small numbers of inputs, but taking too long for larger numbers of inputs. For 1,000,000 factoids per day, the number of pairwise connections to check is 1,000,000,000,000,  just doable on special-purpose hardware at a central site. In practice, I suspect that only a smallish subset of these connections are actually considered in real-time, so there is now the possibility of failing to connect two dots just because there are a lot of dots and therefore a lot more possible connections.

It gets worse. When a new connection changes the weights of the factoids that it connects, these changed weights now affect other factoids to which they are connected; and these in turn propagate a changed weight to the factoids to which they are connected; and so on. In other words, discovering a new connection between two factoids can alter the perceived weight of many, even all, of the other factoids. This means, among other things, that it’s hard to work with just part of the graph, because a change made somewhere else can radically change the meaning of the part.

It gets even worse. The connections themselves, and aggregates of the connections, can become meta-factoids. For example, the fact that person A communicates with person B via phone but person B communicates with person A by email is potentially a useful factoid, revealing something about the communication infrastructure each has access to, their attitudes to it (perhaps their perception about security risks of different technologies), and even that they are trying to communicate covertly (since most people communicate symmetrically). The connections between factoids create a web or graph whose structure at many different levels can reveal relationships among the factoids that change their individual significance.

The bottom line is that it’s extremely hard to find sets of dots with interesting connections. Part of this is the sheer complexity of the data structure and the algorithms that would be required. But we actually don’t know much about the useful algorithms. These webs or graphs have many emergent properties. If we understood them, there are surely ways to focus attention on only parts of the data that have the greatest potential to lead to interesting factoids and connections. Network science is emerging as a new area of research where exactly these kinds of questions are being explored, but it is in its infancy, and we know only the most rudimentary properties of such structures: the common appearance of power laws, preferential attachment as a construction mechanism, some measures of importance of nodes within a graph, and so on. But the big theories remain elusive.

Why H1N1 spread so fast

I heard a talk last week at the Intelligence and Security Informatics Conference (ISI2009) about the models used for disease spread, and I realized why the WHO (and everyone else) were surprised by the speed with which the H1N1 flu spread. These models have different assumptions about the probability of spread from one person to another, how much time each individual is infectious, ill, recovering, and immune or not. But they tend to have one underlying assumption about spread, and that is that it’s a planar phenomenon. Spread is usually modelled as a differential equation, a kind of model that if 500 people are in a school and the probability of infection is 10% in a day, then 50 people will become infected.

The problem with these models is that they don’t take into account the “six degrees of separation” phenomenon. Although most people mix with only a small number of people who are geographically close, enough others mix with people who are geographically far away. As a result, after 3 transmissions, the infection hasn’t reached half the world’s population — but it has reached half way around the world!! Failing to take into account the connectivity between people makes the models far, far too conservative about spread.

Including the graph structure that connects people shows that quarantine mechanisms cannot possibly work. These long-distance connections apply at all scales, not just between countries. So if there’s an outbreak in a single city block, there will be some people who have travelled a few miles away before the infection is detected; in an outbreak in a city, there will be some people who have been to another city; and so on.

Of course, the work on “six degrees of separation” was based on communication, which does not always imply transmission. So the constants might be a but larger; but it seems clear that the pass-the-parcel (pass-the-virus) graph can’t have much larger diameter.

Structure of social network graphs

Many researchers study social network graphs to try and understand how we as humans interact, especially in information systems and online. However, it has always been difficult to validate results because privacy concerns usually limit access to real datasets. Many results have been validated using artificial graphs, generated in a way that mimics the large-scale properties of real graphs. For example, the artificial graphs look like real graphs in the sense that they obey power laws, have the right kind of degree sequences and so on.

Often, preferential attachment is used as the construction technique. In this algorithm, an edge is attached to a vertex with a probability in proportion to the number of edges already attached to it.  This seems intuitively plausible in many human settings: a person with many friends tends to meet more people and so has a greater chance of making more friends.

There have been hints for a while that these artificial graphs were not quite like real graphs, even though they match according to many large-scale measures. For example, Newman showed that, in human graphs, high-degree nodes tended to be connected to high-degree nodes, while in technical networks this was not the case — even though both looked the same from a power-law perspective.

But now Faloutsos’s group at CMU have shown convincingly that there are substantial differences between artificial and real graphs. They looked at what happens to the diameter of a graph as edges are uniformly randomly deleted. As edges are deleted, the diameter grows slowly but, at some point, there’s a sharp increase. They call this the shatter point.

The important thing is that the shatter point of artificial graphs is substantially higher (i.e. the fraction of edges remaining when this happens) than for real graphs. In other words, graphs generated by humans rather than simply by preferential attachment are somehow tougher. Although humans must be choosing edges to connect based on local criteria, they must somehow do this in a way that makes the global srtucture of the graph more robust. It’s not at all clear (to me at least) how this happens, but it seems plausible.

One of the implications of this difference is that it calls into question much of the conventional wisdom about social networks, whenever this has been derived from, or validated by, artificially generated datasets. Which is quite a lot of the time.

Deccan Mujahideen?

The “Deccan Mujihadeen” have claimed responsibility for the Mumbai attacks today. It doesn’t seem clear who this group is, but there are two good reasons why it’s plausible that such a group might exist and have typical Salafist motivations.

The Deccan is the plateau in central eastern India, roughly contiguous with Andhra Pradesh with its capital, Hyderabad. Andhra is what remains of the domains of the Nizam and is a historically Muslim region. As part of the independence agreement, the NIzam was given considerable independence from the rest of India, an agreement that was quickly violated by the Indian government who ‘invaded’ within a year. There is thus a natural solidarity of feeling with Muslims in Kashmir who found themselves part of India rather than (Muslim) Pakistan because of their rajah who was a Hindu.

The descendants of the Nizams of Hyderabad have the best claim to be Caliph, which provides a second basis for radical Islam to be centred in Andhra. A useful source of background is the book The Last Nizam by John Zubrzycki (MacMillan).

OTOH a group called Indian Mujahideen, with much less geographical locality, threatened attacks on Mumbai after successful attacks they carried out on Jaipur. So there may be some opportunism and specious justification in the choice of the group name to take responsibility for these attacks.

There’s some background here

and, by now, in many other places.

Suspicion of an al Qaeda link arises because of the simultaneous attacks, which has been a favourite tactic. However, the range of attack modalities and the low level of skill required for some of them suggest to me that there were a large number of lightly trained participants. Which doesn’t rule out an al Qaeda supervisor, of course.

See also Times of India on this subject.

Call for Papers: Link Analysis, Counterterrorism and Security

The Call for the LACTS 2009 workshop is now available here.

The workshop takes place at the SIAM Data Mining Conference and brings together academics, practitioners, law enforcement, and intelligence people to talk about leading-edge work in the area of adversarial data analysis.

The workshop is intended primarily for early-stage work. The proceedings are published electronically, but authors may retain copyright.

The deadline for submissions is probably late December, but perhaps a little later (still being decided).

Cell phones for money laundering?

There’s been some recent discussion about the risks of being able to store money on cell phones and so to move it about in a way that’s hard to see using conventional tools. Of course, this isn’t really a new thing — putting money on a credit card before a trip and then using it in a different place is a well-known way of moving money across international borders (and, for a while, getting a decent exchange rate while doing it). You can find some of the discussion, in a counterterrorism setting, here.

This concern seems overblown to me. There are significant disadvantages to a terrorist in carrying and using an electronic device that is able to reveal where he is and, worse still, do so without making it obvious. There are a number of issues that require different amounts of skill to exploit:

  1. Cell phones that are turned on tell the nearest tower(s) roughly where they are. The tower can tell the direction in which the phone lies, and can estimate its distance. If multiple towers can see it, they can triangulate to get an even better position estimate. This ability is built in as part of the Extended 911 service that lets emergency services find someone in difficulty easily.
  2. Increasingly cell phones know where they are because they have inbuilt GPS sensing. They can be interrogated for this information under certain circumstances (a beloved plot device in TV dramas). This data can be integrated with other s/w on a phone, providing other channels for it to be disseminated.
  3. Cell phones are not robust from a security point of view and it is relatively straightforward to install hacks on them. For example, you can find instructions for turning every call into a silent conference call with another phone.
  4. SIM cards can be cloned so that another phone in the same cell receives the same packets (although this seems likely to confuse the cell tower).
  5. Even without access to the telco system and the encrypted communication, the device is radiating and so all of the standard location technologies will work. (Picking the device of interest may be difficult in urban settings.)

All of which suggest that cell phones are not going to be the terrorists’ friend any time soon. If they don’t want to carry such devices, they are unlikely to want to use them as electronic wallets.

It may help to keep a cell phone turned off, but this assumes that there’s no backdoor that enables the phone to communicate even when powered down. And it has to be on to be used as a wallet.

Of course, there are anonymous cell phones around, but even this does not solve the problem. There are already data-mining services that attempt to predict when multiple phones are owned by the same person based on the pattern of cell towers that they use with what frequency.

The Hogwarts approach to hiding content in the web

There are several reasons why bad guys might want to make information available on the web, but not have it be accessibe to everyone, especially to those who are looking for them. For example, they may want to disseminate propaganda, do recruitment, or provide training.

I’ve written before about ways to do this: passwords, unusual protocols, virtual worlds, and so on.

Another interesting possibility is to exploit the ability to dynamically serve web pages to provide different versions of web pages,both at different times, and to requests arriving from different parts of world (geolocation now being reasonably well established, good enough that e.g. I can’t access Pandora from Canada).

The essence of the idea is that the link target of a hyperlink can change without the anchor text, and so the appearance, changing. So, for example, a link may point to a completely innocuous page for 59 minutes in the hour, but between 13 and 14 minutes past, may point somewhere else. The exact time is now readily available, and connection speeds on the internet in most parts of the world are good enough that it is possible to visit a particular web page within a given minute. So, if you know in which minute the ‘gate’ is open, you can arrange to go through it.

Of course, there’s still the possibility (a 1 in 60 chance) that a random visitor will go through the ‘gate’ so the basic scheme needs some extension. First the concealed target should look exactly like the regular (59 minutes) target, so that it’s not obvious to an accidental visitor that anything unusual has happened. Second, the scheme can be chained so that a secret link from the second page opens up during some other minute, making it much less likely that an accidental visitor will go through two gates. And the chain can be made as long as necessary to reduce the probability of an accidental passage.

It’s important that the url for each of the pages involved is dynamic so that the final destination (or any of the intermediate steps) can’t be bookmarked. Then each visitor must follow the path on each visit.

Of course, the knowledge required is the minutes when the window is open on each page, and not the exact path. Thus it is possible to open other possible paths during other minutes, and so provide red-herring paths for anyone who becomes suspicious and starts reloading a page every minute and seeing whether and how the source changes.

Of course, the pattern of minutes that the gates open can be made data dependent or changing in other ways that are computable by those who are in the know, but hard for others to figure out.

Since the key is a sequence of numbers in the range 1-60, it can be displayed fairly openly as long as its significance is not appreciated, for example, coded in the publication date and time of some otherwise innocuous and public document.

It’s possible that the length of the window can be reduced below a minute with some experience.

Knowledge Discovery for Counterterrorism and Law Enforcement

My new book, Knowledge Discovery for Counterterrorism and Law Enforcement, is out. You can buy a copy from:

The publisher’s website

Amazon.

(Despite what these pages say, the book is available or will be within a day or two.)

As the holiday season approaches, perhaps you have a relative who’s in law enforcement, or intelligence, or security? What could be better than a book! Or maybe you’d like to buy one for yourself.

(A portion of the price of this book goes to support deserving university faculty.)

Spore — a new covert channel

I’ve written before about how virtual worlds present new challenges to the surveillance of bad guys. In systems such as Second Life, it’s hard to see what someone is doing inside such a virtual world (unless you’re Linden).

The situation is even worse in WWW-like virtual worlds, such as the Multiverse. Here each piece of virtual world is owned by someone, and there are teleport mechanisms (like links) to move between them. Not only can an owner provide a safe place for bad guys to meet where it’s hard to surveille them, but they can also prevent someone being followed by crashing the site after it’s been used. (This is also possible for conventional web sites which I’ll talk about in a later post.)

But new online games such as Spore open up new possibilities for communicating in hard to track ways. Spore is not really a multiplayer game, and it isn’t very obvious how much data sharing is going on. The point of the game is to build civilisations based on creating (designing) organisms which then ‘evolve’ through several stages.

Here’s the communication part. The game only works when it’s connected to the internet, and whenever a creature is created or evolves, it is uploaded to a central site, and then redistributed to the worlds of some other users.

This, by itself, would not be a very useful communication mechanism because the chances of a particular creature ending up in a particular user’s world is very small. But there is a mechanism to point to one of your ‘friends’ and get his/her creatures appearing in your world — and this provides the communication channel. Notice that the channel pulls content, which is what is needed in a covert setting. The person generating the content is the one likely to have attracted attention, so systems where that person has to overtly say whom to communicate with are less attractive for bad guys. It’s much harder to work out who might have asked for content from a channel, especially as some people who didn’t ask for it get it as well.

What kind of content can be sent along this channel? It isn’t a trivial process to transmit volumes of content. The simplest approach is to use each organism as a codeword. A more complex alternative is to use properties of organisms as alphabet symbols and send arbitrary content. And the way in which the civilization evolves can send a very simple signal about how much progress has been made on a particular task — and one that is so subtle that it is extremel hard to notice. What could be more innocent than playing a game?

Reagan vs Obama and McCain

I thought it would be interesting to look at the level of spin in Reagan’s speeches. He shares some characteristics with Obama; not in political opinions but in his ability to motivate an audience, and to be resistant to potentially embarrassing factual issues.
Here is the plot from yesterday’s post comparing Obama and McCain since their conventions, with five campaign speeches (all I could find) by Reagan between the convention and the 1980 election.

Comparing the spin of Reagan, Obama, and McCain

Comparing the spin of Reagan, Obama, and McCain

The points with red stars are Reagan’s speeches. As you can see, his level of spin is much higher than either of today’s candidates. The ability to use high levels of spin without coming across as phony is, of course, what makes an actor, so this is not entirely surprising. And I’ve argued all along that high levels of spin pay off for a politician, and the ability to give high-spin speeches especially to people who do not already like you is a key asset for a politician. Reagan is a good example of this in action.

Spin scores to the end of October

Here is the analysis of levels of spin in Obama and McCain’s speeches up to a few days ago. Usual labelling (refer to previous posts for background).

Spin scores (red - McCain, blue - Obama)

Spin scores (red - McCain, blue - Obama)

The most obvious thing to see in this plot is how McCain’s speeches all tend to lie on one side of the deceptiveness axis while Obama’s tend to lie on the other side. This is because McCain has started using motion words at high rates (and Obama does not). This has a small effect on deceptiveness score, but rates of use of motion verbs are not all that important to signalling deception.

The individual levels of spin from the convention to the end of October are here:

McCain

McCain

Obama

Obama

The last time I posted was during the period where Obama’s level of spin was quite low. As you can see, it has risen sharply again in the past week. This suggests that he is not as confident of winning now as he was then — he has consistently shown a pattern of stepping out from his facade and using lower spin when he feels confident about winning.

Concluding thoughts on spin the U.S. presidential election

When people think about spin and politicians, they usually assume that spin has to do with either presenting widely agreed facts in a way that puts a particular politician in the best light, or altering the facts by exaggerating or misremembering. Media people are always slightly puzzled when they reveal this kind of spin and find that it has little or no resonance with voters. Hillary Clinton’s memory of landing under sniper fire was a one-day wonder, not a deal-breaker. And there have been many other examples of this kind of spin on both sides during the U.S. presidential campaign, and they have had little impact.

This is because voters don’t choose politicians because of how clever their ideas are , whether they agree with these ideas, or even whether they are noticeably competent (history shows). They don’t listen to the candidates’ speeches and parse them for content. They vote for candidates with whom they feel some kind of resonance. And so they look for features of the candidates as people with which they can identify.

A simple way to say this is that voters look for character rather than policy. But this is still a bit misleading — they don’t don’t think about character in moral terms, but in relationship terms. Not “Is this candidate a good person” but “is this candidate a bit like me”. Moral issues do come into play, but only for those voters to whom moral issues are part of their own self image.

The kind of spin I’ve been following through the U.S. election campaign addresses this issue of presentation, that is to what extent do candidates present themselves, as people, in ways that are not congruent with who they really are, as people. In other words, to what extent do they present a persona or facade that is designed to appeal to a wider range of voters than the unadorned person would?

For both John McCain and Hillary Clinton, the short version is that they have, in general, presented something close to the real person. (This may, of course, be because they developed a political persona that they’ve been using so long that it has become the real them.) There have been ups and downs, and it’s been possible to see what might be going through the mind of the candidate and/or the campaign at certain critical moments, but overall they have presented a consistent persona that seems close to their real personality.

Barack Obama, on the other hand, has consistently presented a persona that does not seem to be very close to the real Obama, about whom we can only guess. This is starting to be more widely appreciated. I’ve heard several commentators say how they find him inscrutable because they simply can’t see the real person behind the presentation.

Spin is only one factor in how voters decide who to vote for, so I can’t make a prediction about who will win tomorrow. What I think is predictable is that, if Obama wins, his approval ratings will drop quite quickly when he becomes president — he simply cannot be all of the things that people are projecting on to him, many of them mutually contradictory. And, as president, his actions will speak louder and more clearly than his campaign speeches about who he really is.

Next Page »