Republican candidates’ debate: persona deception results

Here are results from the first Republican debate, combining the early and prime-time material into a single corpus.

There’s more detail about the theory in the previous post, but the basic story is: an election campaign is a socially sanctioned exercise in deception; factual deception is completely discounted and so doesn’t matter, but the interesting question is the deception required of each candidate to present themselves as better than they really are; and the candidate who can implement this kind of deception best tends to be the winner. Note that, although deception often has negative connotations, there are many situations where it is considered appropriate, allowed, or condoned: negotiation, dating, selling and marketing — and campaigns are just a different kind of marketing. Sometimes this is called, in the political context, “spin” but it’s really more subtle than that.

The basic plot show the variation in level of deception, aggregated over all of the turns by each candidate during the debate. The line is the deception axis; the further towards the red end, the stronger the deception. Other variation is caused by variations in the use of different words of the model — different styles.

deceptdocs

These results aren’t terribly surprising. Both Fiorina and Huckabee have broad media experience and so are presumably good at presenting a facade appropriate to many different occasions (and no wonder Fiorina is widely regarded as having “won” the early debate). Trump has low levels of deception — that’s partly because he doesn’t bother with a facade, and partly because the more well-known a person is, the harder it is to successfully present a different facade.

Note, again unsurprisingly, that Carson, while in the middle of the pack on the deception axis, has quite different language patterns from any of the others. That’s partly opportunity — he wasn’t asked the same kind of questions — but partly not being a professional politician.

deceptdocszoomThis figure zooms in to show the structure of the pack in the centre. There isn’t a lot of difference, which reinforces the takeaway that these debates didn’t make a lot of different, positively or negatively, for most of the candidate.

The contributions of language to the ranking can be looked at by drilling down into this table:

wordpatternThe rows are candidates in alphabetical order (Fiorina 5, Huckabee 8, Perry 13, Trump 15), the columns are 42 of the words of the deception model that were actually used in decreasing order of overall frequency, and the blocks are darker in colour when a word used by a candidate makes a greater contribution to the model. The top words were: I, but,  going,  my,  me, or, go, take, look, lead, run, rather, without, move, and hate. So Huckabee’s high score comes primarily from low use of first-person singular pronouns, while Fiorina’s comes from heavier use of lower-ranked words that most others didn’t use. There are qualitative similarities between Fiorina’s language and Carson’s (row 2).

In previous presidential election campaigns, the candidate who managed to present the best facade in the strongest way was the winner.

A separate question is: what kind of facade should a candidate choose? We have empirical results about that too. A winning persona is characterised by: ignoring policy issues completely, ruthlessly eliminating all negative language, using plenty of positive language, and ignoring the competing candidates. Although, at one level, this seems obvious, no candidate and no campaign can bring themselves to do it until their second presidential campaign. But not only does it predict the winner, the margin of victory is also predictable from it as well.

Canadian election 2015: Leaders’ debate

Regular readers will recall that I’m interested in elections as examples of the language and strategy of influence — what we learn can be applied to understanding jihadist propaganda.

The Canadian election has begun, and last night was the first English-language debate by the four party leaders: Stephen Harper, Elizabeth May, Thomas Mulcair, and Justin Trudeau. Party leaders do not get elected directly, so all four participants had trouble wrapping their minds around whether they were speaking as party spokespeople or as “presidential” candidates.

Deception is a critical part of election campaigns, but not in the way that people tend to think. Politicians make factual misstatements all the time, but it seems that voters have already baked this in to their assessments, and so candidates pay no penalty when they are caught making such statements. This is annoying to the media outlets that use fact checking to discover and point out factual misstatements, because nobody cares, and they can’t figure out why.

Politicians also try to present themselves as smarter, wiser, and generally more qualified for the position for which they’re running, and this is a much more important kind of deception. In a fundamental sense, this is what an election campaign is — a Great White Lie. Empirically, the candidate who is best at this kind of persona deception tends to win.

Therefore, measuring levels of deception is a good predictor of the outcome of an election. Recall that deception in text is signalled by (a) reduced use of first-person singular pronouns, (b) reduced use of so-called exclusive words (“but”, “or”) that introduce extra complexity, (c) increased use of action verbs, and (d) increased use of negative-emotion words. This model can be applied by counting the number of occurrences of these words, adding them up (with appropriate signs), and computing a score for each document. But it turns out to be much more effective to add a step that weights each word by how much it varies in the set of documents being considered, and computing this weighted score.

So, I’ve taken the statements by each of the four candidates last night, and put them together into four documents. Then I’ve applied this deception model to these four documents, and ranked the candidates by levels of deceptiveness (in this socially acceptable election-campaign meaning of deceptiveness).

wordseffectsThis figure shows, in the columns, the intensity of the 35 model words that were actually used, in decreasing frequency order. The rows are the four leaders in alphabetical order: Harper, May, Mulcair, Trudeau; and the colours are the intensity of the use of each word by each leader. The top few words are: I, but, going, go, look, take, my, me, taking, or. But remember, a large positive value means a strong contribution of this word to deception, not necessarily a high frequency — so the brown bar in column 1 of May’s row indicates a strong contribution coming from the word “I”, which actually corresponds to low rates of “I”.

deceptdocsThis figure shows a plot of the variation among the four leaders. The line is oriented from most deceptive to least deceptive; so deception increases from the upper right to the lower left.

Individuals appear in different places because of different patterns of word use. Each leader’s point can be projected onto this line to generate a (relative) deception score.

May appears at the most deceptive end of the spectrum. Trudeau and Harper appear at almost the same level, and Mulcair appears significantly lower. The black point represents an artificial document in which each word of the model is used at one standard deviation above neutral, so it represents a document that is quite deceptive.

You might conclude from this that May managed much higher levels of persona deception than the other candidates and so is destined to win. There are two reasons why her levels are high: she said much less than the other candidates and her results are distorted by the necessary normalizations; and she used “I” many fewer times than the others. Her interactions were often short as well, reducing the opportunities for some kinds of words to be used at all, notably the exclusive words.

Mulcair’s levels are relatively low because he took a couple of opportunities to talk autobiographically. This seems intutively to be a good strategy — appeal to voters with a human face — but unfortunately it tends not to work well. To say “I will implement a wonderful plan” invites the hearer to disbelieve that the speaker actually can; saying instead “We will implement a wonderful plan” makes the hearer’s disbelief harder because they have to eliminate more possibilities’ and saying “A wonderful plan will be implemented” makes it a bit harder still.

It’s hard to draw strong conclusions in the Canadian setting because elections aren’t as much about personalities. But it looks as if this leaders’ debate might have been a wash, with perhaps a slight downward nudge for Mulcair.

Time to build an artificial intelligence (well, try anyway)

NASA is a divided organisation, with one part running uncrewed missions, and the other part running crewed missions (and, apparently, internal competition between these parts is fierce). There’s no question that the uncrewed-mission part is advancing both science and solar system exploration, and so there are strong and direct arguments for continuing in this direction. The crewed-mission part is harder to make a case for; the ISS spends most of its time just staying up there, and has been reduced to running high-school science experiments. (Nevertheless, there is a strong visceral feeling that humans ought to continue to go into space, which is driving the various proposals for Mars.)

But there’s one almost invisible but extremely important payoff from crewed missions (no, it’s not Tang!): reliability. You only have to consider the differences in the level of response between the failed liftoff last week of a supply rocket to the ISS, and the accidents involving human deaths. When an uncrewed rocket fails, there’s an engineering investigation leading to a signficant improvement in the vehicle or processes. When there’s a crewed rocket failure, it triggers a phase shift in the whole of NASA, something that’s far more searching. The Grissom/White/Chaffee accident triggered a complete redesign of the way life support systems worked; the Challenger disaster caused the entire launch approval process to be rethought; and the Columbia disaster led to a new regime of inspections after reaching orbit. Crewed missions cause a totally different level of attention to safety which in turn leads to a new level of attention to reliability.

Some changes only get made when there’s a really big, compelling motivation.

Those of us who have been flying across oceans for a while cannot help but be aware that 4-engined aircraft have almost entirely been replaced by 2-engined aircraft; and that spare engines strapped underneath the wing for delivery to a remote location to replace an engine that failed are never seen either. This is just one area where increased reliability has paid off. Cars routinely last for hundreds of thousands of kilometres, which only as few high-end brands used to do forty years ago. There are, of course, many explanations, but the crewed space program is part of the story.

What does this have to do with analytics? I think the time is right to try to build an artificial intelligence. The field known as “artificial intelligence” or machine learning or data mining or knowledge discovery is notorious for overpromising and underdelivering since at least the 1950s so I need to be clear: I don’t think we actually know enough to build an artificial intelligence — but I think it has potential as an organising principle for a large amount of research that is otherwise unfocused.

One reason why I think this might be the right time is that deep learning has shown how to solve (at least potentially) some of the roadblock problems (although I don’t know that the neural network bias in much deep learning work is necessary). Consciousness, which was considered an important part of intelligence, has turned out to be a bit of a red herring. Of course, we can’t tell what the next hard part of the problem will turn out to be.

The starting problem is this: how can an algorithm identify when it is “stuck” and “go meta”? (Yes, I know that computability informs us that there isn’t a general solution but we, as humans, do this all the time — so how can we encode this mechanism algorithmically, at least as an engineering strategy?) Any level of success here would payoff immediately in analytics platforms that are really good at building models, and even assessing them, but hopeless at knowing when to revise them.

Refining “Data Science”

Regular readers will know that I have been thinking about the constellation of ideas that are getting a lot of play in universities and the research community around words like “data science”, and ‘big data”,  and especially the intersection of these ideas with the other constellation of “data mining”, “knowledge discovery” and “machine learning”.

I’ve argued that inductive model discovery (which I think is the core of all of these ideas) is a new way of doing science that is rapidly replacing the conventional Enlightenment or Popperian view of science. This is happening especially quickly in fields that struggled to apply the conventional scientific method, especially in medicine, the social “sciences”, business schools, and in the humanities.

Attending the International Conference on Computational Science meeting made me realise, however, that computational science is a part of this story as well.

Here’s how I see the connections between these three epistemologies:

  1. Conventional science. Understand systems via controlled experiments: setting up configurations that differ in only a few managed ways and seeing whether those differences correspond to different system behaviours. If they do, construct an “explanation”; if they don’t, it’s back to the drawing board.
  2. Computational science. Understand systems by building simulations of them and tweaking the simulations to see if the differences are those that are expected from the tweaks. (Simulations increase the range of systems that can be investigated when either the tweaks can’t be done on the real system, or when the system is hypothesised to be emergent from some simpler pieces.)
  3. Data science. Understand systems by looking at the different configurations that naturally occur and seeing how these correspond to different system behaviors. When they do, construct an “explanation”.

In other words, conventional science pokes the system being investigated in careful ways and sees how it reacts; computational science creates a replica of the system and pokes that; and data science looks at the system being poked and tries to match the reactions to the poking.

Underlying these differences in approach is also, of course, differences in validation: how one tells if an explanation is sufficient. The first two both start from a hypothesis and use statistical machinery to decide whether the hypothesis is supported sufficiently strongly. The difference is that the computational science has more flexibility to set up controlled experiments and so, all things considered, can get stronger evidence. (But there is always the larger question of whether the simulation actually reproduces the system of interest — critical, but often ignored, and with huge risks of “unknown unknowns”.) Data science, in contrast, validates its models of the system being studied by approaches such as the use of a test set, a component of the system that was not used to build the model, but which should behave as the original system did. It is also buttressed by the ability to generate multiple models and so compare among them.

Data science is advancing on two fronts: first, the flexibility it provides to conventional science not to have to construct carefully balanced controlled experiments; second, and much more significantly, the opportunity it creates for making scientific progress in the social sciences and humanities, replacing “qualitative” by “quantitative” in unprecedented ways.

Spectral graph embedding doesn’t work on an adjacency matrix

I’ve heard several talks at conferences in the past few weeks where someone has run an eigendecomposition or SVD on an adjacency matrix and assumed that the embedding they end up with is meaningful. Some of them noticed that this embedding didn’t represent their graph very well. There’s a simple explanation for that — it’s wrong. In this post I’ll try and explain why.

For a graph with n nodes, an adjacency matrix is an nxn matrix whose ijth entry represents the weight of the edge connecting node i and node j. The entries of this matrix are all non-negative and the matrix must be symmetric. (There are ways to handle non-symmetric matrices, that is, directed graphs, but they require significantly more care to embed appropriately.)

Now remember, eigendecompositions or SVDs are numeric algorithms that don’t know that the content of this matrix represents a graph. They regard the rows of the adjacency matrix as vectors in an n-dimensional vector space — and this view does not fit very well with the graph that this matrix is representing. For example, a well-connected node in the graph has a corresponding row with many non-zero entries; as a vector, then, it is quite long and so the point corresponding to its end is far from the origin. A poorly connected node, on the other hand, has mostly zero entries in its row, so it corresponds to a short vector. All of the entries of the adjacency matrix are non-negative, so all of these vectors are in the positive hyperquadrant.

The cloud of points corresponding to the graph therefore looks like this figure:

graphcloud

where the red area represents the well-connected nodes of the graph. The eigendecomposition/SVD of this cloud corresponds to a rotation to new axes and (usually) a projection to a lower-dimensional space. There are several problems with this.

First, the well-connected nodes are on the outside of the cloud, but they should be in the middle — they are important and so should be central. Second, the well-connected nodes should be close to one another in general but they are spread along the outer shell of the cloud. In other words, the cloud derived from the adjacency matrix is inside-out with respect to the natural and expected structure of the graph. Any embedding derived from this cloud is going to inherit its inside-out structure and so will be close to useless.

There is also an equally serious issue: the direction of the first eigenvector of such a cloud will be the vector from the origin to ‘the center of the cloud’ because the numerically greatest variation is between the origin and this center. This vector is shown as black in the figure. So far so good: projection onto this vector does indeed provide an importance ranking for the graph nodes, with the most important projected onto the end away from the origin.

However, the second and subsequent axes are necessarily orthogonal to this first axis — but directions orthogonal to it do not tell us anything about the variation within the cloud. If we took exactly the same shaped cloud and moved it a little in the positive hyperquadrant, the first axis would change, forcing changes in all of the other axes, but the shape of the cloud has not changed! In other words, all of the axes after the first are meaningless as measures of variation in the graph.

The right way to embed a graph is to convert the adjacency matrix to one of several Laplacian matrices. This conversion has the effect of centering the cloud around the origin so that the eigendecomposition/SVD now finds the axes in which the cloud varies, and so gives you the embedding you want.

If you see something, say something — and we’ll ignore it

I arrived on a late evening flight at a Canadian airport that will remain nameless, and I was the second person into an otherwise deserted Customs Hall. On a chair was a cloth shoulder bag and a 10″ by 10″ by 4″ opaque plastic container. Being a good citizen, I went over to the distant Customs officers on duty and told them about it. They did absolutely nothing.

There are lessons here about predictive modelling in adversarial settings. The Customs officers were using, in their minds, a Bayesion predictor, which is the way that we, as humans, make many of our predictions. In this Bayesian predictor, the prior that the ownerless items contained explosives was very small, so the overall probability that they should act was also very small — and so they didn’t act.

Compare this to the predictive model used by firefighters. When a fire alarm goes off, they don’t consider a prior at all. That is, they don’t consider factors such as: a lot of new students just arrived in town, we just answered a hoax call to this location an hour ago, or anything else of the same kind. They respond regardless of whether they consider it a ‘real’ fire or not.

The challenge is how to train front-line defenders against acts of terror to use the firefighter predictive model rather than the Bayesian one. Clearly, there’s still some distance to go.

Bridging airgaps for amateurs

I’ve pointed out before that air gapping (for example, keeping military networks physically separated from the internet) is a very weak mechanism in a world where most devices have microphones and speakers. Devices can communicate using audio, at frequencies humans in the room can’t hear; so that real air gapping requires keeping the two networks separated by distances or soundproofing good enough to prevent this kind of covert channel. The significance of this channel is underappreciated — it’s common even in secure environments to find internet-connected devices in the same room as secure devices.

The ante has been upped a bit by Google’s introduction of Tone, a Chrome add-on that communicates via the audio channel to allow sharing of URLs, in sort of the same way that Palm Pilots used to communicate using infrared. Adapting this app to communicate even more content is surely straightforward, so even amateurs will be able to use the audio channel. Quite apart from the threat to military and intelligence systems, there are many other nasty possibilities, including exfiltrating documents and infecting with malware that can exploit this new channel. And it doesn’t help that its use is invisible (inaudible).

The introduction of LiFi, which will bring many benefits, also introduces a similar side channel when most devices have a camera and a screen.

A world in which cybersecurity is conceived of as a mechanism of walls and gates is looking increasingly obsolete when the network is everywhere, and every gate has holes in it.



Follow

Get every new post delivered to your Inbox.

Join 35 other followers