Posts Tagged 'language'

Getting election winning right

In the previous post I reviewed our model for how to win a U.S presidential election:

  1. Use high levels of positive language;
  2. Avoid negative language completely;
  3. Stay away from policy;
  4. Don’t mention your opponent.

Joe Biden’s speech at Gettysburg was a textbook example of how to do this (and it’s no easy feat avoiding mentioning your opponent when it’s Trump).

He should have stopped after the first five minutes (HT Bob Newhart “On the backs of envelopes, Abe”, also Lincoln himself, 271 words).

After the first five minutes it got rambling and repetitive. The media hates speeches that fit our model, and so the only sound bites came from the second half, which was much less well-written.

How to win a US presidential election — reminder

As the US presidential election ramps up, let me remind you of our conclusions about the language patterns used by winners. Since 1992, the winner is the candidate who:

  1. uses high levels of positive language;
  2. avoids all negative language;
  3. stays away from policy and talks in generalities
  4. doesn’t talk about the opposing candidate

The reason this works is that the choices made by voters are not driven by rational choice but by a more immediate appeal of the candidate as a person. The media doesn’t believe in these rules, and constantly tries to drive candidates to do the opposite. For first time candidates this pressure often works, which is partly why incumbents tend to do well in presidential elections.

But wait, you say. How did Trump win last time? The answer is that, although he doesn’t do well on 2 and 4, Hillary Clinton did very poorly on all four. So it wasn’t that Trump won, so much as that Hillary Clinton lost.

Based on this model, and its historical sucess, Biden is doing pretty much exactly what he needs to do.

Detecting intent and abuse in natural language

One of my students has developed a system for detecting intent and abuse in natural language. As part of the validation, he has designed a short survey to get human assessments of how the system performs.

If you’d like to participate, the url is

Thanks in advance!

Democratic debates strategy

In an analysis of the language used by US presidential candidates in the last 7 elections, Christian Leuprecht and I showed that there’s a language pattern that predicts the winner, and even the margin. The pattern is this: use lots of positive language, use no negative language at all (even words like ‘don’t’ and won’t’), talk about abstractions not policy, and don’t talk about your opponent(s). (For example, Trump failed on the fourth point, but was good on the others, while Hillary Clinton did poorly on all four.)

In some ways, this pattern is intuitive: voters don’t make rational choices of the most qualified candidate — they vote for someone they relate to.

Why don’t candidates use this pattern? Because the media hates it! Candidates (except Trump) fear being labelled as shallow by the media, even though using the pattern helps them with voters. You can see this at work in the way the opinion pieces decide who ‘won’ the debates.

The Democratic debates show candidates using the opposite strategy: lots of detailed policy, lots of negativity (what’s wrong that I will fix), and lots of putting each other down.

Now it’s possible that the strategy needed to win a primary is different to that which wins a general election. But if you want to assess the chances of those who might make it through, then this pattern will help to see what their chances are against Trump in 2020.

Incumbency effects in U.S. presidential campaigns: Language patterns
matter, Electoral Studies, Vol 43, 95-103.

Detecting abusive language online

My student, Hannah Leblanc, has just defended her thesis looking at predicting abusive language. The document is

Rather than treat this as an empirical problem — gather all the signal you can, select attributes using training data, and then build a predictor using those attributes — she started with models of what might drive abusive language. In particular, abuse may be associated with subjectivity (objective language is less likely to be abusive, even if it contains individual words that might look abusive) and with otherness (abuse often results from one group targeting another). She also looked at emotion and mood signals and their association with abuse.

All of the models perform almost perfectly at detecting non-abuse; they struggle more with detecting abuse. Some of this comes from mislabelling — documents that are marked as abusive but really aren’t; but much of the rest comes from missing signal — abusive words disguised so that they don’t match the words of a lexicon.

Overall the model achieves accuracy of 95% and F-score of 0.91.

Is “sentiment analysis” doing anything real?

Oceans of computational cycles have been spent analysing the sentiment of documents, driven by businesses interested in how their products are being perceived, movie producers interested in their potential products, and just about everyone about tweets.

Sentiment is based on a measure of how “positive” or “negative” a particular document is. The problem is that there are a number of aspects of an individual that could be positive or negative, and sentiment analysis jams them all into one bucket and measures them. It’s far from clear that this measures anything real — signs of which can be seen in the well-known one-and-a-half star difference when individuals are asked to rate the same objects on two successive days.

So what can be positive and negative?

It could be the individual’s attitude to a particular object and, of course, this is what most systems purport to be measuring. However, attitude is a two-place relation: A’s attitude to B. It’s usually obvious that a document has been written by A, but much more difficult to make sure that the object about which the attitude is being expressed is actually B.

However, most of the difficulty comes from other aspects that can also be positive and negative. One of these is mood. Mood is an internal setting whose drivers are poorly understood but which is known to be (a) predictable over the course of a period of, say, a day, and (b) composed of two independent components, positive mood and negative mood (that is, not opposites). In broad brush terms, negative mood is stable through the day, while positive mood peaks in the middle of the day. There are longer term patterns as well; positive mood tends to increase through the week while negative mood decreases.

Looking at someone’s writing about an object therefore should take into account their underlying mood — but never does. And it would be difficult to tease apart the signals of mood from the signals of attitude with the current state of the art. But we could plausibly predict that “sentiment” would be less positive overall if it was captured at the beginning or end of the day.

The other aspect that can be positive or negative is emotion. Emotions are short-term responses to the current environment that play a role in reordering each individual’s priorities to optimize decision making, especially in response to an external stimulus.  There are two emotions that align strongly with positivity (joy) and negativity (disgust).

Looking at someone’s writing about an object should therefore take into account their emotional state (at the time they were writing) — but never does. Again it would be difficult to tease the signals of emotion and the signals of attitude apart. I have no doubt that many businesses get much worse results from their surveys than they ‘should’ because those surveys are designed so poorly that they become annoying, and this spills over into the content of the responses.

Bottom line: there is no such thing as positive sentiment or negative sentiment. There are positive or negative attitudes, moods, and emotions, but the one that sentiment analysis is trying to measure — attitudes — is inextricably confounded by the other two.  Progress is being made in understanding and detecting moods and emotions, but much less has been done on detecting attitudes, mostly because of the difficulty of finding the intended object within a short piece of text.


Advances in Social Network Analysis and Mining Conference — Sydney

This conference will be in Sydney in 2017, from 31st July to 3rd August.

As well as the main conference, there is also a workshop, FOSINT: Foundations of Open Source Intelligence, which may be of even more direct interest for readers of this blog.

Also I will be giving a tutorial on Adversarial Analytics as part of the conference.

6.5/7 US presidential elections predicted from language use

I couldn’t do a formal analysis of Trump/Clinton language because Trump didn’t put his speeches online — indeed many of them weren’t scripted. But, as I posted recently, his language was clearly closer to our model of how to win elections than Clinton’s was.

So since 1992, the language model has correctly predicted the outcome, except for 2000 when the model predicted a very slight advantage for Gore over Bush (which is sort of what happened).

People judge candidates on who they seem to be as a person, a large part of which is transmitted by the language they use. Negative and demeaning statements obviously affect this, but so does positivity and optimism.

Voting is not rational choice

Pundits and the media continue to be puzzled by the popularity of Donald Trump. They point out that much of what he says isn’t true, that his plans lack content, that his comments about various subgroups are demeaning, and so on, and so on.

Underlying these plaintive comments is a fundamental misconception about how voters choose the candidate they will vote for. This has much more to do with standard human, in the first few seconds, judgements of character and personality than it does about calm, reasoned decision making.

Our analysis of previous presidential campaigns (about which I’ve posted earlier) makes it clear that this campaign is not fundamentally different in this respect. It’s always been the case that voters decide based on the person who appeals to them most on a deeper than rational level. As we discovered, the successful formula for winning is to be positive (Trump is good at this), not to be negative (Trump is poor at this), not to talk about policy (Trump is good at this), and not to talk about the opponent (Trump is poor at this). On the other hand, Hillary Clinton is poor at all four — she really, really believes in the rational voter.

We’ll see what happens in the election this week. But apart from the unusual facts of this presidential election, it’s easy to understand why Trump isn’t doing worse and Hillary Clinton isn’t doing better from the way they approach voters.

It’s not classified emails that are the problem

There’s been reporting that the email trove, belonging to Huma Abedin but found on the laptop of her ex-husband, got there as the result of automatic backups from her phone. This seems plausible; if it is true then it raises issues that go beyond whether any of the emails contain classified information or not.

First, it shows how difficult it is for ordinary people to understand, and realise, the consequences of their choices about configuring their life-containing devices. Backing up emails is good, but every user needs to understand what that means, and how potentially invasive it is.

Second, to work as a backup site, this laptop must have been Internet-facing and (apparently) unencrypted. That means that more than half a million email messages were readily accessible to any reasonably adept cybercriminal or nation-state. If there are indeed classified emails among them, then that’s a big problem.

But even if there are not, access to someone’s emails, given the existence of textual analytics tools, means that a rich picture can be built up of that individual: what they are thinking about, who they are communicating with (their ego network in the jargon), what the rhythm of their day is, where they are located physically, what their emotional state is like, and even how healthy they are.

For any of us, that kind of analysis would be quite invasive. But when the individual is a close confidante of the U.S. Secretary of State, and when many of the emails are from that same Secretary, the benefit of a picture of them at this level of detail is valuable, and could be exploited by an adversary.

Lawyers and the media gravitate to the classified information issue. This is a 20th Century view of the problems that revealing large amounts of personal text cause. The real issue is an order of magnitude more subtle, but also an order of magnitude more dangerous.

“It’s going to be really great”

Donald Trump continues to be the poster child for our election-winning-language  model: high positive language, as little negative language as possible, and appeals to policy goals without getting into details. The media and pundits are tearing their hair out because he refuses to talk about specifics but, as we predict, it’s working! (Interestingly, I went back and looked at Perot’s language in the 1992 election, and he had more or less the same patterns — and he led the party contenders in national polls for a period in 1992.)

What the media and pundits don’t realise is that incumbent presidents running for a second term use language very similar to Trump’s. It’s just that, with a first-term track record, it’s not as glaringly obvious, and they don’t notice.

Trump’s continuing success

As I posted earlier, our study of previous successful presidential candidates shows that success is very strongly correlated with a particular language model, consisting of:

  • Uniformly positive language
  • Complete absence of negative language
  • Using uplifting, aspirational metaphors rather than policy proposals, and
  • Ignoring the competing candidates

Trump presumably polls well, to a large extent, because he uses this language model (not so much ignoring of the competing candidates recently, but maybe that’s the effect of a primary). This language pattern tends to be used by incumbent presidents running for re-election, and seems to derive from their self-perception as already-successful in the job they’re re-applying for. Trump, similarly, possesses huge self confidence that seems to have the same effect — he perceives himself as (automatically, guaranteed) successful as president.

The dynamic between the successful self-perception issue and the competence issue was hard to separate before; and we’ve used ‘statesmanlike’ to describe the model of language of electoral success. All of the presidential incumbents whom we previously studied had a self-perception of success and a demonstrated competence and we assumed that both were necessary to deploy the required language comfortably and competently. Trump, however, shows that this isn’t so — it’s possible to possess the self-perception of success without the previously demonstrated competence. In Trump’s case, presumably, it is derived from competence in a rather different job: building a financial empire.

The media is in a frenzy about the competence issue for Trump. But our language model explains how it is possible to be popular among voters without demonstrating much competence, or even planned competence, to solve the problems of the day.

Voters don’t care about objective competence in the way that the media do. They care about the underlying personal self-confidence that is revealed in each candidate’s language. The data is very clear about this.

It may even be the rational view that a voter should take. Presidents encounter, in office, many issues that they had not previously formulated a policy for, so self-confidence may be more valuable than prepackaged plans. And voters have learned that most policies do not get implemented in office anyway.

It’s silly to treat Trump as a front runner when no actual vote has yet been cast. But it wouldn’t be surprising if he continues to do well for some time.  Of the other candidates, only Christie shows any sense of the use of positive language but, as a veteran politician, he cannot seem to avoid the need to present policies.

Results from the first Democratic debate

The debate held on Tuesday night pitted one well known figure (Hillary Clinton) against one up and coming figure (Sanders) and three others with no name recognition except among the wonkiest. The differences in exposure and preparation were obvious. I can’t see that it made any difference to anyone’s opinions.

But it remains interesting to see how well each person did at presenting a persona. Extremely well known politicians do not usually have the luxury of presenting themselves with a new, improved persona because the old one is so well known, so it’s common to find that persona deception scores are low for such candidates. For those who aren’t well-known, the strength of their persona is a blend of how well they can do it personally, and how big the gap is between their previous self-image and the persona that they are trying to project. A relatively unknown candidate with a high persona deception score, therefore, is likely to do well; one with a low score probably will not.

Here are the results from this debate:

deceptdocsThe red and greeen points represent artificial word use corresponding to moderately high amd moderately low levels of persona deception. Clinton, as expected (and from my analysis in the 2008 cycle) has low levels of persona deception. Sanders’s levels are in the mid-range. Chafee is sincere, but this won’t help him with his current level of recognition. O’Malley has the highest level of persona deception, which is a positive indicator for him (for what it’s worth in this crowd). Webb is also in the midrange, but his language use is quite different from that of Sanders.

How not to win an election (Canadian leaders economic debate)

I’ve written about what kind of language politicians should use if they want to win elections. The Canadian party leaders economic debate last night was a text book example of how not to do it.

Here’s the playbook that they were apparently working from:

  • On no account be positive about anything. Rose coloured glasses are actively dangerous when talking about economic issues, because expectations never become reality in this domain.
  • Make sure that you provide all of the details of your economic policies (“Our plan will raise fiscal/tax/payments by 2.3% over 27 months”) but on no account tie them to real people or any aspect of their lives.
  • Compare your policies to those of other parties by first explaining the other party’s plan and THEN explaining why it’s a bad idea. Viewers have long attention spans and are probably taking notes anyway.
  • Make sure that viewers understand the smallest differences between your plan and the competition (“our great plan is 2.3% over 27 months, their appalling plan is only 2.2% and over only 26 months”) — the Mr Rumbold Strategy.
  • If you have a good talking point, make sure you use it at least six times (eight would be better).
  • Talk at the same time as everyone else. This helps viewers hone their cocktail party skills.
  • Make sure to refer to every previous action of the other parties going back to the 1930s. Viewers want to consider track record of the long dead when deciding which party to vote for.
  • Change the names that you call people at random moments — use their surnames for a while and then switch to first names, and then switch back.
  • Make sure that when you’re using talking points, you’re fluent, but when you are going extempore, you use lots of “ums” and “ers” so viewers can easily tell which is which.

And if you want to put on a debate, here some useful tips:

  • Preface the debate with 15 mins of punditry by members of the sponsoring organisation, even if they have no television skills.
  • Make sure that the background is artificial and creates edges around the debate participants. Save electricity by using as little lighting as possible. Use dark colours — after all the economic picture is gloomy.
  • Forbid the “moderator” from doing anything other than reading out the questions.
  • Make sure that all of the microphones are live at all times. On no account put microphone controls where the producer can access them.

I expect that, after the performances of everyone involved last night and in the prep, there will be a huge demand from the U.S. candidates to hire these people away for more large scale campaigns.

Results from second Republican debate

Regular readers will know that, especially in a crowded marketplace, politicians try to stand out and attract votes by presenting themselves in the best possible light that they can. This is a form of deception, and carries the word-use signals associated with deception, so it can be measured using some straightforward linguistic analysis.

Generally speaking, the candidate who achieves the highest level of this persona deception wins, so candidates try as hard as they can. There are, however, a number of countervailing forces. First, different candidates have quite different levels of ability to put on this kind of persona (Bill Clinton excelled at it). Second, it seems to be quite exhausting, so that candidates have trouble maintaining it from day to day. Third, the difficulty depends on the magnitude of the difference between the previous role and the new one that is the target of a campaign: if a vice-president runs for president, he is necessarily lumbered with the persona that’s been on view in the previous job; if not, it’s easier to present a new persona and make it seem compelling (e.g. Obama in 2008). Outsiders therefore have a greater opportunity to re-invent themselves. Fourth, it depends on the content of what is said: a speech that’s about pie in the sky can easily present a new persona, while one that talks about a candidate’s track record cannot, because it drags the previous persona into at least the candidate’s mind.

Some kinds of preparation can help to improve the persona being presented — a good actor has to be able to do this. But politicians aren’t usually actors manqué so the levels of persona deception that they achieve from day to day emerge from their subconscious and so provide fine-grained insights into how they’re perceiving themselves.

The results from the second round of debates are shown in the figure:


The red and green points represent artificial debate participants who use all of the words of the deception model at high frequency and low frequency respectively.

Most of the candidates fall into the band between these two extremes, with Rand Paul with the lowest level of persona deception (which is what you might expect). The highest levels of deception are Christie and Fiorina, who had obviously prepped extensively and were regarded as having done well; and Jindal, who is roughly at the same level, but via completely different word use.

Comparing these to the results from the first round of debates, there are two obvious changes: Trump has moved from being at the low end of the spectrum to being in the upper-middle; and Carson has moved from having very different language patterns from all of the other candidates to being quite similar to most of them. This suggests that both of them are learning to be better politicians (or being sucked into the political machine, depending on your point of view).

The candidates in the early debate have clustered together on the left hand side of the figure, showing that there was a different dynamic in the two different debates. This is an interesting datum about the strength of verbal mimicry.

The secret of Trump’s success

Looking at US presidential elections through the lens of empirical investigation of word use shows that there’s a pattern of language that is associated with electoral success. Those who use it win, and the difference in the intensity of the pattern correlates well with the margin of victory.

The effective pattern is, in a way, intuitive: use positive language, eliminate negative language completely, talk in the abstract rather than about specific policies, and pay no attention to the other candidates.

In other words, a successful candidate should appear “statesmanlike”.

Candidates find it extremely difficult to use this approach — they feel compelled to compare themselves to the other candidates, dragging in negativity, and to explain the cleverness of their policies. Only incumbent presidents, in our investigation, were able to use this language pattern reliably.

I listened to some of Trump’s speech in Texas last night, and I’ve come to see that the media are completely and utterly wrong about why he is doing so well in the polls. It’s not that he’s tapping into a vein of disaffection with the political system; it is that he’s using this language model. In previous cycles, it’s only been incumbent presidents who’ve had the self-confidence to use it, but Trump, of course, has enough self-confidence to start a retail business selling it.

Let’s look at the components of the model:

Positive language: Trump’s positivity is orders of magnitude above that of the other candidates, and in two ways. First, he is relentlessly positive about the U.S. and about the future (catchphrase: “we can do better”). Second, he’s positive about almost everyone he mentions (catchphrase: “he’s a great guy”).

Negative language: Trump doesn’t avoid negativity altogether, but he uses it cleverly. First, his individual negative targets are not the other candidates (by and large) but pundits — Karl Rove and George Will were mentioned last night, but I doubt if more than 1% of the audience could have identified either in a line-up; so this kind of negativity acts as a lightning rod, without making Trump seem mean. And the negative references to others lack the bitterness that often bleeds through in the negative comments of more typical candidates. Second, when he mentions negative aspects of the Obama administration and its policies and actions, he does it be implication and contrast (“that’s not what I would do”, “I could do better”).

Vision not policies: the media cannot stand that Trump doesn’t come out with detailed policy plans, but it’s been clear for a while that voters don’t pay a lot of attention to policies. They’ve learned that (a) there’s a huge gap between what a president can want to do and what he can actually make happen, and (b) policies are generated with one eye on the polls and focus groups, so they often aren’t something that the candidate has much invested in doing in the first place. [It’s incredible that Secretary Clinton ran focus groups to prep her “apology”, which was actually a meta-apology for not having apologized better or earlier.]

Trump has one huge “policy” advantage — he isn’t beholden to donors, and so is freer of the behind-the-scenes pressure that most candidates face. In the present climate, this has to be a huge selling point.

Ignore the other candidates: Trump doesn’t quite do this (and it gets him into trouble), But he’s learning fast — in last night’s speech, he only mentioned a handful of his competitors and his comments about all of them were positive.

If Trump continues to give this kind of speech, then the more exposure he gets, the more voters are going to like him. I remain doubtful that he will be the Republican nominee, but I don’t see him flaming out any time soon. Even if he makes some serious gaffe, he’ll apologize in seconds and move on (in contrast to Clinton who seems determined to make acute issues into chronic ones).

Republican candidates’ debate: persona deception results

Here are results from the first Republican debate, combining the early and prime-time material into a single corpus.

There’s more detail about the theory in the previous post, but the basic story is: an election campaign is a socially sanctioned exercise in deception; factual deception is completely discounted and so doesn’t matter, but the interesting question is the deception required of each candidate to present themselves as better than they really are; and the candidate who can implement this kind of deception best tends to be the winner. Note that, although deception often has negative connotations, there are many situations where it is considered appropriate, allowed, or condoned: negotiation, dating, selling and marketing — and campaigns are just a different kind of marketing. Sometimes this is called, in the political context, “spin” but it’s really more subtle than that.

The basic plot show the variation in level of deception, aggregated over all of the turns by each candidate during the debate. The line is the deception axis; the further towards the red end, the stronger the deception. Other variation is caused by variations in the use of different words of the model — different styles.


These results aren’t terribly surprising. Both Fiorina and Huckabee have broad media experience and so are presumably good at presenting a facade appropriate to many different occasions (and no wonder Fiorina is widely regarded as having “won” the early debate). Trump has low levels of deception — that’s partly because he doesn’t bother with a facade, and partly because the more well-known a person is, the harder it is to successfully present a different facade.

Note, again unsurprisingly, that Carson, while in the middle of the pack on the deception axis, has quite different language patterns from any of the others. That’s partly opportunity — he wasn’t asked the same kind of questions — but partly not being a professional politician.

deceptdocszoomThis figure zooms in to show the structure of the pack in the centre. There isn’t a lot of difference, which reinforces the takeaway that these debates didn’t make a lot of different, positively or negatively, for most of the candidate.

The contributions of language to the ranking can be looked at by drilling down into this table:

wordpatternThe rows are candidates in alphabetical order (Fiorina 5, Huckabee 8, Perry 13, Trump 15), the columns are 42 of the words of the deception model that were actually used in decreasing order of overall frequency, and the blocks are darker in colour when a word used by a candidate makes a greater contribution to the model. The top words were: I, but,  going,  my,  me, or, go, take, look, lead, run, rather, without, move, and hate. So Huckabee’s high score comes primarily from low use of first-person singular pronouns, while Fiorina’s comes from heavier use of lower-ranked words that most others didn’t use. There are qualitative similarities between Fiorina’s language and Carson’s (row 2).

In previous presidential election campaigns, the candidate who managed to present the best facade in the strongest way was the winner.

A separate question is: what kind of facade should a candidate choose? We have empirical results about that too. A winning persona is characterised by: ignoring policy issues completely, ruthlessly eliminating all negative language, using plenty of positive language, and ignoring the competing candidates. Although, at one level, this seems obvious, no candidate and no campaign can bring themselves to do it until their second presidential campaign. But not only does it predict the winner, the margin of victory is also predictable from it as well.

Canadian election 2015: Leaders’ debate

Regular readers will recall that I’m interested in elections as examples of the language and strategy of influence — what we learn can be applied to understanding jihadist propaganda.

The Canadian election has begun, and last night was the first English-language debate by the four party leaders: Stephen Harper, Elizabeth May, Thomas Mulcair, and Justin Trudeau. Party leaders do not get elected directly, so all four participants had trouble wrapping their minds around whether they were speaking as party spokespeople or as “presidential” candidates.

Deception is a critical part of election campaigns, but not in the way that people tend to think. Politicians make factual misstatements all the time, but it seems that voters have already baked this in to their assessments, and so candidates pay no penalty when they are caught making such statements. This is annoying to the media outlets that use fact checking to discover and point out factual misstatements, because nobody cares, and they can’t figure out why.

Politicians also try to present themselves as smarter, wiser, and generally more qualified for the position for which they’re running, and this is a much more important kind of deception. In a fundamental sense, this is what an election campaign is — a Great White Lie. Empirically, the candidate who is best at this kind of persona deception tends to win.

Therefore, measuring levels of deception is a good predictor of the outcome of an election. Recall that deception in text is signalled by (a) reduced use of first-person singular pronouns, (b) reduced use of so-called exclusive words (“but”, “or”) that introduce extra complexity, (c) increased use of action verbs, and (d) increased use of negative-emotion words. This model can be applied by counting the number of occurrences of these words, adding them up (with appropriate signs), and computing a score for each document. But it turns out to be much more effective to add a step that weights each word by how much it varies in the set of documents being considered, and computing this weighted score.

So, I’ve taken the statements by each of the four candidates last night, and put them together into four documents. Then I’ve applied this deception model to these four documents, and ranked the candidates by levels of deceptiveness (in this socially acceptable election-campaign meaning of deceptiveness).

wordseffectsThis figure shows, in the columns, the intensity of the 35 model words that were actually used, in decreasing frequency order. The rows are the four leaders in alphabetical order: Harper, May, Mulcair, Trudeau; and the colours are the intensity of the use of each word by each leader. The top few words are: I, but, going, go, look, take, my, me, taking, or. But remember, a large positive value means a strong contribution of this word to deception, not necessarily a high frequency — so the brown bar in column 1 of May’s row indicates a strong contribution coming from the word “I”, which actually corresponds to low rates of “I”.

deceptdocsThis figure shows a plot of the variation among the four leaders. The line is oriented from most deceptive to least deceptive; so deception increases from the upper right to the lower left.

Individuals appear in different places because of different patterns of word use. Each leader’s point can be projected onto this line to generate a (relative) deception score.

May appears at the most deceptive end of the spectrum. Trudeau and Harper appear at almost the same level, and Mulcair appears significantly lower. The black point represents an artificial document in which each word of the model is used at one standard deviation above neutral, so it represents a document that is quite deceptive.

You might conclude from this that May managed much higher levels of persona deception than the other candidates and so is destined to win. There are two reasons why her levels are high: she said much less than the other candidates and her results are distorted by the necessary normalizations; and she used “I” many fewer times than the others. Her interactions were often short as well, reducing the opportunities for some kinds of words to be used at all, notably the exclusive words.

Mulcair’s levels are relatively low because he took a couple of opportunities to talk autobiographically. This seems intutively to be a good strategy — appeal to voters with a human face — but unfortunately it tends not to work well. To say “I will implement a wonderful plan” invites the hearer to disbelieve that the speaker actually can; saying instead “We will implement a wonderful plan” makes the hearer’s disbelief harder because they have to eliminate more possibilities’ and saying “A wonderful plan will be implemented” makes it a bit harder still.

It’s hard to draw strong conclusions in the Canadian setting because elections aren’t as much about personalities. But it looks as if this leaders’ debate might have been a wash, with perhaps a slight downward nudge for Mulcair.

Empirical Assessment of Al Qaeda, Isis, and Taliban Propaganda

I’ve just been working on assessing the potential impact of the three major magazines: Inspire (AQAP), Azan (Taliban), and Dabiq (ISIS), competing for the market in lone wolf jihadists in the West.

I compare these magazines using models for the intensity of informative, imaginative, deceptive, jihadist, and gamification language, and build an empirical model for propaganda which combines these into a single scale.

Unsurprisingly, Dabiq ranks highest in propaganda intensity.

The details can be found in the full draft paper, posted to SSRN:

Skillicorn, David, Empirical Assessment of Al Qaeda, Isis, and Taliban Propaganda (January 7, 2015). Available at SSRN:

Election winning language patterns

One of the Freakonomics books makes the point that, in football aka soccer, a reasonable strategy when facing a free kick is to stay in the middle of the goal rather than diving to one side or the other — but goaltenders hardly ever follow this strategy because they look like such fools if it doesn’t pay off. Better to dive, and look as if you tried, even if it turns out that you dived the wrong way.

We’ve been doing some work on what kind of language to use to win elections in the U.S. and there are some similarities between the strategy that works, and the goal tending strategy.

We looked at the language patterns of all of the candidates in U.S. presidential elections over the past 20 years, and a very clear language pattern for success emerged. Over all campaigns, the candidate who best deployed this language won, and the margin of victory relates quite strongly to how well the language was used (for example, Bush and Gore used this pattern at virtually identical levels in 2000).

What is this secret language pattern that guarantees success? It isn’t very surprising: high levels of positive words, non-existent levels of negative words, abstract words in preference to concrete ones, and complete absence of reference to the opposing candidate(s).

What was surprising is how this particular pattern of success was learned and used. Although the pattern itself isn’t especially surprising, no candidate used it from the start; they all began with much more conventional patterns: negativity, content-filled policy statements, and comparisons between themselves and the other candidates. With success came a change in language, but the second part of the surprise is that the change happened, in every case, over a period of little more than a month. For some presidents, it happened around the time of their inaugurations; for others around the time of their second campaign, but it was never a gradual learning curve. This suggests that what happens is not a conscious or unconscious improved understanding of what language works, but rather a change of their view of themselves that allows them to become more statesmanlike (good interpretation) or entitled (bad interpretation). The reason that presidents are almost always re-elected is that they use this language pattern well in their second campaigns. (It’s not a matter of changing speechwriting teams or changes in the world and so in the topics being talked about — it’s almost independent of content.)

So there’s plenty of evidence that using language like this leads to electoral success but, just as for goaltenders, no candidate can bring himself or herself to use it, because they’d feel so silly if it didn’t work and they lost.