Archive for the 'Uncategorized' Category

‘AI’ performance not what it seems

As I’ve written about before, ‘AI’ tends to be misused to refer to almost any kind of data analytics or derived tool — but let’s, for the time being, go along with this definition.

When you look at the performance of these tools and systems, it’s often quite poor, but I claim we’re getting fooled by our own cognitive biases into thinking that it’s much better than it is.

Here are some examples:

  • Netflix’s recommendations for any individual user seem to overlap 90% with the ‘What’s trending’ and ‘What’s new’ categories. In other words, Netflix is recommending to you more or less what it’s recommending to everyone else. Other recommendation systems don’t do much better (see my earlier post on ‘The Sound of Music Problem’ for part of the explanation).
  • Google search results are quite good at returning, in the first few links, something relevant to the search query, but we don’t ever get to see what was missed and might have been much more relevant.
  • Google News produces what, at first glance, appear to be quite reasonable summaries of recent relevant news, but when you use it for a while you start to see how shallow its selection algorithm is — putting stale stories front and centre, and occasionally producing real howlers, weird stories from some tiny venue treated as if they were breaking and critical news.
  • Self driving cars that perform well, but fail completely when they see certain patches on the road surface. Similarly, facial recognition systems that fail when the human is wearing a t-shirt with a particular patch.

The commonality between these examples, and many others, is that the assessment from use is, necessarily, one-sided — we get to see only the successes and not the failures. In other words (HT Donald Rumsfeld), we don’t see the unknown unknowns. As a result, we don’t really know how well these ‘AI’ systems really do, and whether it’s actually safe to deploy them.

Some systems are ‘best efforts’ (Google News) and that’s fair enough.

But many of these systems are beginning to be used in consequential ways and, for that, real testing and real public test results are needed. And not just true positives, but false positives and false negatives as well. There are two main flashpoints where this matters: (1) systems that are starting to do away with the human in the loop (self driving cars, 737 Maxs); and (2) systems where humans are likely to say or think ‘The computer (or worse, the AI) can’t be wrong’; and these are starting to include policing and security tools. Consider, for example, China’s social credit system. The fact that it gives low scores to some identified ‘trouble makers’ does not imply that everyone who gets a low score is a trouble  maker — but this false implication lies behind this and almost all discussion of ‘AI’ systems.

Advertisements

Huawei’s new problem

The Huawei Cyber Security Evaluation Centre (HCSEC) is a joint effort, between GCHQ and Huawei, to increase confidence in Huawei products for use in the UK. It’s been up and running since 2013.

In its 2018 report, the focus was on issues of replicable builds. Binaries compiled in China were not the same size as binaries built in the UK. To a computer scientist, this is a bad sign since it suggests that the code contains conditional compilation statements such as:

If country_code == UK

insert backdoor

In the intervening year, they have dug into this issue, and the answer they come up with is unexpected. It turns out that the problem is not a symptom of malice, but a symptom of incompetence. The code is simply not well enough engineered to produce consistent results.

Others have discussed the technical issues in detail:

https://www.theregister.co.uk/2019/03/28/hcsec_huawei_oversight_board_savaging_annual_report/

but here are some quotes from the 2019 report:

“there remains no end-to-end integrity of the products as delivered by Huawei and limited confidence on Huawei’s ability to understand the content of any given build and its ability to perform true root cause analysis of identified issues. This raises significant concerns about vulnerability management in the long-term”

“Huawei’s software component management is defective, leading to higher vulnerability rates and significant risk of unsupportable software”

“No material progress has been made on the issues raised in the
previous 2018 report”

“The Oversight Board continues to be able to provide only limited
assurance that the long-term security risks can be managed in the
Huawei equipment currently deployed in the UK”

Not only is the code quality poor, but they see signs of attempts to cover up the shortcuts and practices that led to the issue in the first place.

The report is also scathing about Huawei’s efforts/promises to clean up its act; and they estimate a best case timeline of 5 years to get to well-implemented code.

5G (whatever you take that to mean) will be at least ten times more complex than current networking systems. I think any reasonable computer scientist would conclude that Huawei will simply be unable to build such systems.

Canada, and some other countries, are still debating whether or not to ban Huawei equipment. This report suggests that such decisions can be depoliticised, and made based purely on economic grounds.

But, from a security point of view, there’s still an issue — the apparently poor quality of Huawei software creates a huge threat surface that can be exploited by the governments of China (with or without Huawei involvement), Russia, Iran, and North Korea, as well as non-state actors and cyber criminals.

(Several people have pointed out that other network multinationals have not been scrutinised at the same depth and, for all we know, they may be just as bad. This seems to me implausible. One of the unsung advantages that Western businesses have is the existence of NASA, which has been pioneering reliable software for 50 years. If you’re sending a computer on a one-way trip to a place where no maintenance is possible, you pay a LOT of attention to getting the software right. The ideas and technology developed by NASA have had an influence in software engineering programs in the West that has tended to raise the quality of all of the software developed there. There have been unfortunate lapses, whenever the idea that software engineering is JUST coding becomes popular (Windows 95, Android apps) but overall the record is reasonably good. Lots better than the glimpse we get of Huawei, anyway.)

Annular similarity

When similarity is used for clustering, then obviously the most similar objects need to be placed in the same cluster.

But when similarity is being used for human consumption, a different dynamic is in play — humans usually already know what the most similar objects are, and are interested in those that are (just) beyond those.

This can be seen most clearly in recommender systems. Purchase an item or watch a Netlflix show, and your recommendation list will fill up with new objects that are very similar to the thing you just bought/watched.

From a strictly algorithm point of view, this is a success — the algorithm found objects similar to the starting object. But from a human point of view this is a total fail because it’s very likely that you, the human, already know about all of these recommended objects. If you bought something, you probably compared the thing you bought with many or all of the objects that are now being recommended to you. If you watched something, the recommendations are still likely to be things you already knew about.

The misconception about what similarity needs to mean to be useful to humans is at the heart of the failure of recommender systems, and even the ad serving systems that many of the online businesses make their money from. Everyone has had the experience of buying something, only to have their ad feed (should they still see it) fill up with ads for similar products (“I see you just bought a new car — here are some other new cars you might like”).

What’s needed is annular similarity — a region that is centred at the initial object, but excludes new objects that are too similar, and focuses instead on objects that are a bit similar.

Amazon tries to do this via “People who bought this also bought” which can show useful add-on products. (They also use “People who viewed this also viewed” but this is much less effective because motivations are so variable.) But this mechanism also fails because buying things together doesn’t necessarily mean that they belong together — it’s common to see recommendations based on the fact that two objects were on special on the same day, and so more likely to be bought together because of the opportunity, rather than any commonality.

Annular similarity is also important in applications that help humans to learn new things: web search, online courses, intelligence analysis. That’s why we built the ATHENS divergent web search engine (refs below) — give it some search terms and it returns (clusters of) web pages that contain information that is just over the horizon from the search terms. We found that this required two annuli — we first constructed the information implicit in the search terms, then an annulus around that of information that we assumed would be known to someone who knew the core derived from the search terms, and only then did we generate another annulus which contains the results returned.

We don’t know many algorithmic ways to find annular similarity. In any distance-based clustering it’s possible, of course, to define an annulus around any point. But it’s tricky to decide on what the inner and outer radii should be, the calculations have to happen in high-dimensional space where the points are very sparse, and it’s not usually clear whether the space is isotropic.

Annular similarity doesn’t work (at least straightforwardly) in density-based (e.g. DBScan) or distribution-based clustering (e.g. EM) because the semantics of ‘cluster’ doesn’t allow for an annulus.

One way that does work (and was used extensively in the ATHENS system) is based on singular vallue decomposition (SVD). An SVD projects a high-dimensional space into a low-dimensional one in such a way as to preserve as much of the variation as possible. One of its useful side-effects is that a point that is similar to many other points tends to be projected close to the origin; and a point that is dissimilar to most other points also tends to be projected close to the origin because the dimension(s) it inhabits have little variation and tend to be projected away. In the resulting low-dimensional projection, points far from the origin tend to be interestingly dissimilar to those at the centre of the structure — and so an annulus imposed on the embedding tends to find an interesting set of objects.

Unfortunately this doesn’t solve the recommender system problem because recommenders need to find similar points that have more non-zeroes than the initial target point — and the projection doesn’t preserve this ordering well. That means that the entire region around the target point has to be searched, which becomes expensive.

There’s an opportunity here to come up with better algorithms to find annular structures. Success would lead to advances in several diverse areas.

(A related problem is the Sound of Music problem, the tendency for a common/popular object to muddle the similarity structure of all of the other objects because of its weak similarity to all of them. The Sound of Music plays this role in movie recommendation systems, but think of wrapping paper as a similar object in the context of Amazon. I’ve written about this in a previous post.)

 

Tracy A. Jenkin, Yolande E. Chan, David B. Skillicorn, Keith W. Rogers:
Individual Exploration, Sensemaking, and Innovation: A Design for the Discovery of Novel Information. Decision Sciences 44(6): 1021-1057 (2013)
Tracy A. Jenkin, David B. Skillicorn, Yolande E. Chan:
Novel Idea Generation, Collaborative Filtering, and Group Innovation Processes. ICIS 2011
David B. Skillicorn, Nikhil Vats:
Novel information discovery for intelligence and counterterrorism. Decision Support Systems 43(4): 1375-1382 (2007)
Nikhil Vats, David B. Skillicorn:
Information discovery within organizations using the Athens system. CASCON 2004: 282-292

 

China-Huawei-Canada fail

Huawei has been trying to convince the world that they are a private company with no covert relationships to the Chinese government that might compromise the security of their products and installations.

This attempt has been torpedoed by the Chinese ambassador to Canada who today threatened ‘retaliation’ if Canada joins three of the Five Eyes countries (and a number of others) in banning Huawei from provisioning 5G networks. (The U.K. hasn’t banned Huawei equipment, but BT is uninstalling it, and the unit set up jointly by Huawei and GCHQ to try to alleviate concerns about Huawei’s hardware and software has recently reported that it’s less certain about the security of these systems now than it was when the process started.)

It’s one thing for a government to act as a booster for national industries — it’s another to deploy government force directly.

China seems to have a tin ear for the way that the rest of the world does business; it can’t help but hurt them eventually.

The cybercrime landscape in Canada

Statscan recently released the results of their survey of cybercrime and cybersecurity in 2017 (https://www150.statcan.gc.ca/n1/pub/71-607-x/71-607-x2018007-eng.htm).

Here are some the highlights:

  • About 20% of Canadian businesses had a cybersecurity incident (that they noticed). Of these around 40% had no detectable motive, another 40% were aimed at financial gain, and around 23% were aimed at getting information.
  • More than half had enough of an impact to prevent the business operating for at least a day.
  • Rates of incidents were much higher in the banking sector and pipeline transportation sector (worrying), and in universities (not unexpected, given their need to operate openly).
  • About a quarter of businesses don’t use an anti-malware tool, about a quarter do not have email security (not clear what this means, but presumably antivirus scanning of incoming email, and maybe exfiltration protection), and almost a third do not have network security. These are terrifying numbers.

Relatively few businesses have a policy re managing and reporting cybersecurity incidents; vanishingly few have senior executive involvement in cybersecurity.

It could be worse, but this must be disappointing to those in the Canadian government who’ve been developing and pushing out cyber awareness.

People in glass houses

There’s a throwaway line in Woodward’s book about the Trump White House (“Fear”, Simon and Schuster, 2018) where he says that the senior military were unwilling to carry out offensive cyber-offensive operations because they didn’t think the US would fare well under retaliation.

Then this week the GAO came out with a report on cybersecurity in DOD weapons systems (as opposed to DOD networks). It does not make happy reading. (Full report).

Here’s what seems to me to be the key quotation:

“We found that from 2012 to 2017, DOD testers routinely found mission critical cyber vulnerabilities in nearly all weapon systems that were under development. Using relatively simple tools and techniques, testers were able to take control of these systems and largely operate undetected”

Almost every word could be italicized and many added exclamation marks would hardly suffice.

To be fair, some of these systems are still under development. But the report makes clear that, for many of them, cybersecurity was not really considered in their design. The typical assumption was that weapons systems are standalone. But in a world where software runs everything, there has to be a mechanism for software updates at least, and so a connection to the outside world. As the Iranians discovered, even update from a USB is not attack-proof. And security is a difficult property to retrofit, so these systems will never be as cyberattack resistant as we might all have wished.

Predicting fraud risk from customer online properties

This interesting paper presents the results of an investigation into how well a digital footprint, properties associated with online interactions with a business such as platform and time of day, can be used to predict risk of non-payment for a pay-on-delivery shopping business.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3163781

The properties that are predictive are not, by themselves, all that surprising: those who shop in the middle of the night are higher risk, those who come from price comparison web sites  are lower risk, and so on.

What is surprising is that, overall, predictive performance rivals, and perhaps exceeds, risk prediction from FICO (i.e. credit scores) — but these properties are much easier to collect, and the model based on them can be applied to those who don’t have a credit score. What’s more, the digital footprint and FICO-based models are not very correlated, and so using both does even better.

The properties collected for the digital fingerprint are so easy to collect that almost any online business or government department could (and should) be using them to get a sense of their customers.

I’ve heard (but can’t find a reference) that Australian online insurance quotes vary in price based on what time of day they are requested — I suppose based on an intuition that procrastination is correlated with risky behaviour. I’d be grateful if anyone has details of any organisation that it using this kind of predictive model for their customers.

Businesses like Amazon require payment up front — but they also face a panel of risks, including falsified payments (stolen credit cards) and delivery hijacking. Approaches like this might well help in detecting these other kinds of fraud.


Advertisements