Posts Tagged 'spam'

The epidemiology of spam

As someone who’s had the same email address for nearly 40 years, I get a lot of spam. (Of course, almost all of it is automatically filtered away.)

It’s been noticeable that spam was way down from January this year; and became vanishingly rare once India was put on lockdown last week.

But this week it’s come roaring back as China once again opens for business. I guess we know where most of it comes from (and maybe spam has a role to play as a covid-19 detector — perhaps we can find out how many infections there are really in Iran, for example).

Canada’s Anti Spam — its one good feature spoiled

I commented earlier that the new Canadian Anti Spam law and Spam Reporting Centre were a complete waste of money because:

1. Spam is no longer a consumer problem, but a network problem which this legislation won’t help.
2. Most spammers are beyond the reach of Canadian law enforcement, even if attribution could be cleanly done.
3. There’s an obvious countermeasure for spammers — send lots of spam to the Spam Reporting Centre and pollute the data.

There was one (unintended) good feature, however. Legitimate businesses who knew my email address and therefore assumed, as businesses do, that I would love to get email from them about every imaginable topic, have felt obliged to ask for my permission to keep doing so. (They aren’t getting it!)

BUT all of these emails contain a link of the form “click here to continue getting our (um) marketing material”, because they’ve realised that nobody’s going to bother with a more cumbersome opt-in mechanism.

Guess what? Spear phishing attacks have been developed to piggyback on this flood of permission emails — I’ve had a number already this week. Since they appear to come from mainstream organisations and the emails look just like theirs, I’m sure they’re getting lots of fresh malware downloaded. So look out for even more botnets based in Canada. And thanks again, Government of Canada for making all of this possible.

Spam Reporting Centre

The Canadian government has decided to create a spam reporting centre (aka ‘The Freezer’) to address issues arising from cybercrime and communications fraud and annoyances of various kinds.

The idea cannot possibly work on technical grounds. More worryingly, it displays a lack of awareness of the realities of cybersecurity that is astounding.

The first peculiarity is that the Centre is supposed to address four problems: email spam, unsolicited phone calls, fake communications a la Facebook, and malware. Although these have a certain superficial similarity — they all annoy individuals — they do not raise the same kinds of technical issues underneath, and no one person could be an expert in detecting, let along prosecuting all of them. It’s a bit like trying to amalgamate the Salvation Army and the police force because they both wear uniforms and help people!

The Centre will rely on reports from individuals: get a spam email and forward it to the Centre, for example. One of the troubles with this idea is that individuals don’t usually have enough information to report such things in a useful way, and they don’t make good starting points for an eventual prosecution. Canada already has a way to report unsolicited phone calls but it only works for people who almost keep the law by announcing who they are at the beginning. The annoying (and illegal) robocalls can’t be reported because the person who gets them doesn’t know where they are coming from and who’s making them. And where there are prosecutions, each person who reports such a call has to sign an affidavit that the purported call did actually happen to provide the legal basis for the incident.

The second, huge, problem with this idea is that, if individuals can report bad incidents, then spammers can also report fake bad incidents! And they can do it in such volume that investigators will have no way to distinguish the real from the fake. Creating fake spam emails and evading mechanisms such as captchas to prevent wholesale reporting  is very easy.

There is also the deeper problem that besets all cybersecurity — attribution. It is always hard to trace cyberexploits back to their origins, and these origins are overwhelmingly likely to be computers taken over by botnets anyway. Working back along such chains to find someone to prosecute is tedious and expert work that depends on starting from as much information as possible.

The right way to address this problem is to set up honeytraps — machines and phones that seem to be ordinary but are instrumented so that, when an exploit happens, as much information as possible is collected at the time. Now there is a foundation for deciding which incidents are worth pursuing and starting out in pursuit with the best possible information. And, who knows, the knowledge that such systems are out there might dampen some of the enthusiasm on the part of the bad guys.

Low Hanging Fruit in Cybersecurity II

If cybersecurity exists to stop bad things happening in computing systems, then it seems to me that there are several implicit assumptions that underlie many approaches and techniques that might not be completely helpful. These are:

  • The distinction between “good” (or “allowable”) and “bad” is a binary distinction;
  • The decision about this distinction has to be made monolithically in a single step;
  • The distribution of likely things that could happen is uniform (flat).

Even to write them explicitly shows that they can’t quite be right, but nevertheless I suspect they exist, unexamined, in the design of many security systems.

What happens if we remove these assumptions?

If the distinction between “good” and “bad” is not discrete, then our systems instead allocate some kind of continuous risk or suspicion to actions. This creates an interesting new possibility — the decision about what to do about an action can now be decoupled from how the action is categorized. This is not even a possibility if the only distinction we recognize is binary.

From a purely technical point of view, this means that many different kinds of risk measuring algorithms can be developed and used orthogonally to decisions about what the outputs of these algorithms means. Critical boundaries can be determined after the set of risks has been calculated, and may even be derived from the distribution of such risks. For example, bad things are (almost always) rare, so a list of actions ordered by risk will normally have a bulge of “normal” actions and then a small number of anomalous actions. The boundary could be placed at the edge of the bulge.

Second, what if the decision about whether to allow an action doesn’t have to be made all at once. Then systems can have defence in depth. The first, outer, layer can decide on the risk of a new action and decide whether or not to allow it. But it can be forgiving of potential risky actions if there are further layers of categorization and defence to follow. What it can do is to disallow the clearly and definitively bad things, reducing the number of potentially bad things that have to be considered at later stages.

From a technical point of view, this means that weaker but cheaper algorithms can be used on the front lines of defence, with more effective but more expensive algorithms available for later stages (where they work with less data, and so do not cost as much overall, despite being more expensive per instance).

Third, what if our defence took into account that the landscape of expected actions is not uniform, so that low probability events should automatically be treated as more suspicious. For example, spam filtering does lots of clever things, but it doesn’t build a model of the sources of my email, and flag emails from countries that I’ve never, ever received email from as inherently more likely to be spam. (Yes, I know that sender addresses can be spoofed.)

This idea has been used in behavioral profiling of computer activity, and it sort of works. But it needs to be combined with the ideas above, so that actions can be rated along a continuum from: routine (allow), to unusual but still not that unusual (allow, but maybe with a user question or at least logged for occasional inspection), to very unusual (user better explicitly allow), to bizarre (disallow). Windows has a weak version of this, which hasn’t been accepted well by users, but it flags only one thing (program start) and it doesn’t build a model of typical behavior by each user.

For example, the set of IP addresses with which my computer interacts is quite large, and hard to represent by some kind of convex structure, so intrusion detection doesn’t work very well if it depends on wrapping/categorising those IP addresses that are OK, and blocking traffic from those that are not. And usually the set of OK IP addresses is not derived from those I interact with, but encoded in some set of rules that apply to many computers. But if instead I built a model of the IP addresses I interact with, allowing older ones to get stale and disappear, and then looked at new IP addresses and allowed them if they resembled (tricky) those I already interact with, and asked me about the others, then this might work better than current approaches. An IP address is a hierarchical structure, with a possible country followed by the top octet, and so on, so I can discriminate quite finely about what it might mean. Even a web server that is theoretically visible to every other IP address could still benefit from handling unlikely source IP addresses differently.

OK, maybe this isn’t exactly low hanging fruit, but the ideas are straightforward and (IMHO) should be built into the design of more robust systems.

Low Hanging Fruit in Cybersecurity

There’s a rapidly increasing interest in cybersecurity, partly because the penny is finally dropping about the financial impact on government and business, and not just individuals.

I don’t work directly in this area, but it is another adversarial domain, so some of the problems and approaches I think about have some applicability.

But cybersecurity does seem to be an area where some Pareto (80:20) thinking might not go amiss.

For example, botnets use machines that have been compromised but whose owners/users don’t realise this. In my experience there are two reasons for this:

  1. The owners don’t realise that things are not right with their machine and put anomalous network and disk traffic down to the general weirdness of computers;
  2. The machines do not really have an owner — they drive other bits of hardware, or they are shared among many people who each use them for a short time.

One thing that would really help would be for someone knowledgeable (not e.g. Lifehacker — although that would be a start — but someone from the cybersecurity community) would post a complete guide to hardening the major categories of PCs. Most people know that they should be running a spam filter and antivirus software,¬† but from comments at a recent conference it seems that many people are still seeing spam, which suggests to me that they haven’t even got this simple part right. But there are many other tools that, if run on almost all machines, would cripple the ability to take them over. For example, I’ve been very happy with Iobit’s Security 360 which routinely finds malware on my machine behind a university “security system”. But I don’t use a VPN when I travel and I probably should. And what else should I be doing that I don’t even know is possible?

Volunteers anyone? (Of course, such a thing may exist, but I haven’t been able to find it.)