Posts Tagged 'Customs'

Provenance — a neglected aspect of data analytics

Provenance is defined by Merriam-Webster as “the history of ownership of a valued object or work of art or literature“, but the idea has much wider applicability.

There are three kinds of provenance:

  1. Where did an object come from. This kind of provenance is often associated with food and drink: country of origin for fresh produce, brand for other kinds of food, appellation d’origine contrôlée for French wines, and many other examples. This kind of provenance is usually signalled by something that is attached to the object.
  2. Where did an object go on its way from source to destination. This is actually the most common form of provenance historically — the way that you know that a chair really is a Chippendale is to be able to trace its ownership all the way back to the maker. A chair without provenance is probably much less valuable, even though it may look like a Chippendale, and the wood seems the right age. This kind of provenance is beginning to be associated with food. For example, some shipments now have temperature sensors attached to them that record the maximum temperature they ever encountered between source and destination. Many kinds of shipments have had details about their pathway and progress available to shippers, but this is now being exposed to customers as well. So if you buy something from Amazon you can follow its progress (roughly) from warehouse to you.
  3. The third kind of provenance is still in its infancy — what else did the object encounter on it way from source to destination. This comes in two forms. First, what  other objects was it close to? This is the essence of Covid19 contact tracing apps, but it applies to any situation where closeness could be associated with poor outcomes. Second, where the objects that it was close to ones that were expected or made sense?

The first and second forms of provenace don’t lead to interesting data-analytics problems. They can be solved by recording technologies with, of course, issues of reliability, unforgeability, and non-repudiation.

But the third case raises many interesting problems. Public health models of the spread of infection usually assume some kind of random particle model of how people interact (with various refinements such as compartments). These models would be much more accurate if they could be based on actual physical encounter networks — but privacy quickly becomes an issue. Nevertheless, there are situations where encounter networks are already collected for other reasons: bus and train driver handovers, shift changes of other kinds, police-present incidents; and such data provides natural encounter networks. [One reason why Covid19 contact tracing apps work so poorly is that Bluetooth proximity is a poor surrogate for potentially infectious physical encounter.]

Customs also has a natural interest in provenance: when someone or something presents at the border, the reason they’re allowed to pass or not is all about provenance: hard coded in a passport, pre-approved by the issue of a visa, or with real-time information derived from, say, a vehicle licence plate.

Some of clearly suspicious, but hard to detect, situations arise from mismatched provenance. For example, if a couple arrive on the same flight, then they will usually have been seated together; if two people booked their tickets or got visas using the same travel agency at the same time then they will either arrive on different flights (they don’t know each other), or they will arrive on the same flight and sit together (they do know each other). In other words, the similarity of provenance chains should match the similarity of relationships, and mismatches between the two signal suspicious behaviour. Customs data analytics is just beginning to explore leveraging this kind of data.

Backdoors to encryption — 100 years of experience

The question of whether those who encrypt data, at rest or in flight, should be required to provide a master decryption key to government or law enforcement is back in the news, as it is periodically.

Many have made the obvious arguments about why this is a bad idea, and I won’t repeat them.

But let me point out that we’ve been here before, in a slightly different context. A hundred years ago, law enforcement came up against the fact that criminals knew things that could (a) be used to identify other criminals, and (b) prevent other crimes. This knowledge was inside their heads, rather than inside their cell phones.

Then, as now, it seemed obvious that law enforcement and government should be able to extract that knowledge, and interrogation with violence or torture was the result.

Eventually we reached (in Western countries, at least) an agreement that, although there could be a benefit to the knowledge in criminals’ heads, there was a point beyond which we weren’t going to go to extract it, despite its potential value.

The same principle surely applies when the knowledge is on a device rather than in a head. At some point, law enforcement must realise that not all knowledge is extractable.

(Incidentally, one of the arguments made about the use of violence and torture is that the knowledge extracted is often valueless, since the target will say anything to get it to stop. It isn’t hard to see that devices can be made to use a similar strategy. They would have a pin code or password that could be used under coercion and that would appear to unlock the device, but would in fact produce access only to a virtual subdevice which seemed innocuous. Especially as Customs in several countries are now demanding pins and passwords as a condition of entry, such devices would be useful for innocent travellers as well as guilty — to protect commercial and diplomatic secrets for a start.)

Tips for adversarial analytics

I put togethers this compendium of thngs that are useful to know for those starting out in analytics for policing, signals intelligence, counterterrorism, anti-money-laundering, cybersecurity, and customs; and which might be useful to those using analytics when organisational priorities come into conflict with customers (as they almost always do).

Most of the content is either tucked away in academic publications, not publishable by itself, or common knowledge among practitioners but not written down.

I hope you find it helpful (pdf):  Tips for Adversarial Analytics

Provenance — the most important ignored concept

I’ve been thinking lately about the problems faced by Customs and Border Protection pieces of government. The essential problem they face is: when an object (possibly a person) arrives at a border, they must decide whether or not to allow that object to pass, and whether to charge for the privilege.

The basis for this decision is never the honest face of the traveller or the nicely wrapped package. Rather, all customs organisations gather extra information to use. At one time, this was a document of some sort accompanying the object, and itself certified by someone trustworthy; increasingly it is data collected about the object before it arrived at the border, and also the social or physical network in which it is embedded. For humans, this might be data provided on a visa application, travel payment details, watch lists, and social network activity. For physical objects, this might be shipper and destination properties and history.

In other words, customs wants to know the provenance of every object, in as much detail as possible, and going back as far as possible. All of the existing analytic techniques in this space are approximating that provenance data and how it can be leveraged.

Other parts of government, for example law enforcement and intelligence, are also interested in provenance. For example, sleeper agents are those embedded in another country with a false provenance. The use of aliases is an attempt to break chains of provenance.

Ordinary citizens are wary of this kind of government data collection, feeling that society rightly depends on the existence of two mechanisms for deleting parts of our personal history: ignoring non-consequential parts (youthful hijinks, sealed criminal records) at least after they disappear far enough into the past; and forgiving other parts. There’s a widespread acceptance that people can change and it’s too restrictive to make this impossible.

The way we handle this today is that we explicitly delete certain information from the provenance, but make decisions as if the provenance were complete. This works all of the way from government level to the level of a marriage.

There is another way to handle it, though. That is to keep the provenance complete, but make decisions in ways that acknowledge explicitly that using some data is less appropriate. The trouble is that nobody trusts organisations to make decisions in this second way.

There’s an obvious link between blockchains and provenance, but the whole point of a blockchain is to make erasure difficult. So they will really only have a role to play in this story if we, collectively, adopt the second mechanism.

The issues become more difficult if we consider the way in which businesses use our personal information. Governments have a responsibility to implement social decisions and so are motivated to treat individual data in socially approved ways. Businesses have no such constraint — indeed they have a fiduciary responsibility to leverage data as much as they can. A business that knows data about me has no motivation to delete or ignore some of it.

This is the dark side of provenance. As I’ve discussed before, the ultimate goal of online businesses is to build models of everyone on the planet and how much they’re willing to pay for every product. In turn, sales businesses can use these models to apply differential pricing to their customers, and so increase their profits.

This sets up an adversarial situation, where consumers are trying to look like their acceptable price is much lower than it is. Having a complete, unalterable personal provenance makes this much, much more difficult. But by the time most consumers realise this, it will be too late.

In all of these settings, the data collected about us is becoming so all-encompassing and permanent that it is going to change the way society works. Provenance is an under-discussed topic, but it’s becoming a central one.

If you see something, say something — and we’ll ignore it

I arrived on a late evening flight at a Canadian airport that will remain nameless, and I was the second person into an otherwise deserted Customs Hall. On a chair was a cloth shoulder bag and a 10″ by 10″ by 4″ opaque plastic container. Being a good citizen, I went over to the distant Customs officers on duty and told them about it. They did absolutely nothing.

There are lessons here about predictive modelling in adversarial settings. The Customs officers were using, in their minds, a Bayesion predictor, which is the way that we, as humans, make many of our predictions. In this Bayesian predictor, the prior that the ownerless items contained explosives was very small, so the overall probability that they should act was also very small — and so they didn’t act.

Compare this to the predictive model used by firefighters. When a fire alarm goes off, they don’t consider a prior at all. That is, they don’t consider factors such as: a lot of new students just arrived in town, we just answered a hoax call to this location an hour ago, or anything else of the same kind. They respond regardless of whether they consider it a ‘real’ fire or not.

The challenge is how to train front-line defenders against acts of terror to use the firefighter predictive model rather than the Bayesian one. Clearly, there’s still some distance to go.