And so it begins

Stories out today that Google is now able to connect the purchasing habits of anyone it has a model for (i.e. almost everybody who’s ever been online) with Google’s own data on online activity.

For example, this story:

http://www.smh.com.au/technology/technology-news/google-now-knows-when-its-users-go-to-the-store-and-buy-stuff-20170523-gwbnza.html

Google says that this enables them to draw the line between the ads that users have been shown, and the products that they buy. There’s a discrepancy in this story because Google also claim that they don’t get the list of products purchased using a credit card, but only the total amount. So a big hmmmmm.

(And if I were Google, I’d be concerned that there isn’t much of a link! Consumers might be less resentful if Google did indeed serve ads for things they wanted to buy, but everyone I’ve ever heard talk about online ads says the same thing: the ads either have nothing to do with their interests, or they are ads for things that they just bought.)

But connecting users to purchases (rather than ads to purchases) is the critical step to building a model of how much users are willing to pay — and this is the real risk of multinational data collection and analytics (as I’ve discussed in earlier posts).

Lessons from Wannacrypt and its cousins

Now that the dust has settled a bit, we can look at the Wannacrypt ransomware, and the other malware  that are exploiting the same vulnerability, more objectively.

First, the reason that this attack vector existed is because Microsoft, a long time ago, made a mistake in a file sharing protocol. It was (apparently) exploited by the NSA, and then by others with less good intentions, but the vulnerability is all down to Microsoft.

There are three pools of vulnerable computers that played a role in spreading the Wannacrypt worm, as well as falling victim to it.

  1. Enterprise computers which were not being updated in a timely way because it was too complicated to maintain all of their other software systems at the same time. When Microsoft issues a patch, bad actors immediately try to reverse engineer it to work out what vulnerability it addresses. The last time I heard someone from Microsoft Security talk about this, they estimated it took about 3 days for this to happen. If you hadn’t updated in that time, you were vulnerable to an attack that the patch would have prevented. Many businesses evaluated the risk of updating in a timely way as greater than the risk of disruption because of an interaction of the patch with their running systems — but they may now have to re-evaluate that calculus!
  2. Computers running XP for perfectly rational reasons. Microsoft stopped supporting XP because they wanted people to buy new versions of their operating system (and often new hardware to be able to run it), but there are many, many people in the world for whom a computer running XP was a perfectly serviceable product, and who will continue to run it as long as their hardware keeps working. The software industry continues to get away with failing to warrant their products as fit for purpose, but it wouldn’t work in other industries. Imagine the discovery that the locks on a car stopped working after 5 years — could a manufacturer get away with claiming that the car was no longer supported? (Microsoft did, in this instance, release a patch for XP, but well after the fact.)
  3. Computers running unregistered versions of Microsoft operating systems (which therefore do not get updates). Here Microsoft is culpable for an opposite reason. People can run an unregistered version for years and years, provided they’re willing to re-install it periodically. It’s technically possible to prevent (or make much more difficult) this kind of serial illegality.

The analogy is with public health. When there’s a large pool of unvaccinated people, the risk to everyone increases. Microsoft’s business decisions make the pool of ‘unvaccinated’ computers much larger than it needs to be. And while this pool is out there, there will always be bad actors who can find a use for the computers it contains.

Asymmetric haggling

I’ve pointed out before that the real threat to privacy comes not from governments, but from multinationals; and the key model of you that they want to build is how much you’re willing to pay for products and services. They can then use this information directly (airlines, Amazon) or sell it to others (Google).

We’re used to a world in which products and services cost the same for every buyer, but this world is rapidly disappearing. There’s a good article about the state of play in the May issue of the Atlantic:

https://www.theatlantic.com/magazine/archive/2017/05/how-online-shopping-makes-suckers-of-us-all/521448/

The price are quoted in an online setting already depends on the time of day, what platform you’re using, and your previous browsing history.

Of course, fixed prices are a relatively new invention, and haggling is still how prices are determined in much of the world. The difference in the online world is¬† that the seller has much more data, and modelling capability, to work out what you’re willing to pay than you have about how cheaply the seller is willing to sell. So not only is pricing an adversarial process, it’s become a highly asymmetric one.

This is going to have all sorts of unforeseen consequences: analytics for buyers; buying surrogates; a return to bricks and mortar shopping for some people; less overall buying, … We’ll find out.

In the meantime, it’s always a good idea to shop online using multiple browsers on multiple platforms, deleting cookies and other trackers as much as you can.

Advances in Social Network Analysis and Mining Conference — Sydney

This conference will be in Sydney in 2017, from 31st July to 3rd August.

http://asonam.cpsc.ucalgary.ca/2017/

As well as the main conference, there is also a workshop, FOSINT: Foundations of Open Source Intelligence, which may be of even more direct interest for readers of this blog.

Also I will be giving a tutorial on Adversarial Analytics as part of the conference.

Even more security theatre

I happened to visit a consulate to do some routine paperwork. Here’s the security process I encountered:

  1. Get identity checked from passport, details entered (laboriously) into online system.
  2. Cell phone locked away.
  3. Wanded by metal detection wand.
  4. Sent by secure elevator to another floor, to a waiting room with staff behind bullet-proof glass.

Here’s the thing: I got to carry my (unexamined) backpack with me through the whole process!

And what’s the threat from a cell phone in this context? Embarrassing pictures of the five year old posters on the wall of the waiting room?

I understand that government departments have difficulty separating serious from trivial risks, because if anything happened they would be blamed, regardless of how low-probability the risk was. But there’s no political reason not to make whatever precautions you decide to take actually helpful to reduce the perceived risks.

Secrets and authentication: lessons from the Yahoo hack

Authentication (I’m allowed to access something or do something) is based on some kind of secret. The standard framing of this is that there are three kinds of secrets:

  1. Something I have (like a device that generates 1-time keys)
  2. Something I am (like a voiceprint or a fingerprint), or
  3. Something I know (like a password).

There are problems with the first two mechanisms. Having something (a front door key) is the way we authenticate getting into our houses and offices, but it doesn’t transfer well to the digital space. Being something looks like it works better but suffers from the problem that, if the secret becomes widely known, there’s often no way to change the something (“we’ll be operating on your vocal cords to change your voice, Mr. Smith”). Which is why passwords tend to be the default authentication mechanism.

At first glance, passwords look pretty good. I have a secret, the password, and the system I’m authenticating with has another secret, the encrypted version of the password. Unfortunately, the system’s secret isn’t very secret because the encrypted version of my password is almost always transmitted in clear because of the prevalence of wifi. Getting from the system’s secret to mine is hard, which is supposed to prevent reverse engineering my secret from the system’s.

The problem is that the space of possible passwords is small enough that the easy mapping, from my secret to the system’s, can be tried for all strings of reasonable length. So brute force enables the reverse engineering that was supposed to be hard. Making passwords longer and more random helps, but only at the margin.

We could instead make the secret a function instead of a string. As the very simplest example, the system could present me with a few small integers, and my authentication would be based on knowing that I’m supposed to add the first two and subtract the third. My response to the system is the resulting value. Your secret might be to add the first and the third and ignore the second.

But limitations on what humans can compute on the fly means that the space of functions can’t actually be very large, so this doesn’t lead to a practical solution.

Some progress can be made by insisting that both I and the system must have different secrets. Then a hack of either the system or of me by phishing isn’t enough to gain access to the system. There are a huge number of secret sharing schemes of varying complexity. But for the simplest example, my secret is a binary string of length n, and the system’s secret is another binary string of length n. We exchange encrypted versions of our strings, and the system authenticates me if the exclusive-or of its string and mine has a particular pattern. Usefully, I can also find out if the system is genuine by carrying out my own check. This particular pattern is (sort of) a third secret, but one that neither of us have to communicate and so is easier to protect.

This system can be broken, but it requires a brute force attack on the encrypted version of my secret, the encrypted version of the system’s secret, and then working out what function is applied to merge the two secrets (xor here, but it could be something much more complex). And that still doesn’t get access to the third secret.

Passwords are the dinosaurs of the internet age; secret sharing is a reasonable approach for the short to medium term, but (as I’ve argued here before) computing in compromised environments is still the best hope for the longer term.

Security theatre lives

Sydney tests its emergency notification system in the downtown core at the same time of day every time. So if a person wanted to cause an incident, guess what time they would choose?

It also seems to be done on Fridays, which is exactly the worst day to choose, since it’s the most common day for islamist incidents.

Security theatre = doing things that sound like they improve security without actually improving them (and sometimes making them worse).