Posts Tagged 'US election'

Getting election winning right

In the previous post I reviewed our model for how to win a U.S presidential election:

  1. Use high levels of positive language;
  2. Avoid negative language completely;
  3. Stay away from policy;
  4. Don’t mention your opponent.

Joe Biden’s speech at Gettysburg was a textbook example of how to do this (and it’s no easy feat avoiding mentioning your opponent when it’s Trump).

He should have stopped after the first five minutes (HT Bob Newhart “On the backs of envelopes, Abe”, also Lincoln himself, 271 words).

After the first five minutes it got rambling and repetitive. The media hates speeches that fit our model, and so the only sound bites came from the second half, which was much less well-written.

How to win a US presidential election — reminder

As the US presidential election ramps up, let me remind you of our conclusions about the language patterns used by winners. Since 1992, the winner is the candidate who:

  1. uses high levels of positive language;
  2. avoids all negative language;
  3. stays away from policy and talks in generalities
  4. doesn’t talk about the opposing candidate

https://www.sciencedirect.com/science/article/pii/S0261379416302062

The reason this works is that the choices made by voters are not driven by rational choice but by a more immediate appeal of the candidate as a person. The media doesn’t believe in these rules, and constantly tries to drive candidates to do the opposite. For first time candidates this pressure often works, which is partly why incumbents tend to do well in presidential elections.

But wait, you say. How did Trump win last time? The answer is that, although he doesn’t do well on 2 and 4, Hillary Clinton did very poorly on all four. So it wasn’t that Trump won, so much as that Hillary Clinton lost.

Based on this model, and its historical sucess, Biden is doing pretty much exactly what he needs to do.

Democratic debates strategy

In an analysis of the language used by US presidential candidates in the last 7 elections, Christian Leuprecht and I showed that there’s a language pattern that predicts the winner, and even the margin. The pattern is this: use lots of positive language, use no negative language at all (even words like ‘don’t’ and won’t’), talk about abstractions not policy, and don’t talk about your opponent(s). (For example, Trump failed on the fourth point, but was good on the others, while Hillary Clinton did poorly on all four.)

In some ways, this pattern is intuitive: voters don’t make rational choices of the most qualified candidate — they vote for someone they relate to.

Why don’t candidates use this pattern? Because the media hates it! Candidates (except Trump) fear being labelled as shallow by the media, even though using the pattern helps them with voters. You can see this at work in the way the opinion pieces decide who ‘won’ the debates.

The Democratic debates show candidates using the opposite strategy: lots of detailed policy, lots of negativity (what’s wrong that I will fix), and lots of putting each other down.

Now it’s possible that the strategy needed to win a primary is different to that which wins a general election. But if you want to assess the chances of those who might make it through, then this pattern will help to see what their chances are against Trump in 2020.

Incumbency effects in U.S. presidential campaigns: Language patterns
matter, Electoral Studies, Vol 43, 95-103.
https://www.sciencedirect.com/science/article/pii/S0261379416302062

6.5/7 US presidential elections predicted from language use

I couldn’t do a formal analysis of Trump/Clinton language because Trump didn’t put his speeches online — indeed many of them weren’t scripted. But, as I posted recently, his language was clearly closer to our model of how to win elections than Clinton’s was.

So since 1992, the language model has correctly predicted the outcome, except for 2000 when the model predicted a very slight advantage for Gore over Bush (which is sort of what happened).

People judge candidates on who they seem to be as a person, a large part of which is transmitted by the language they use. Negative and demeaning statements obviously affect this, but so does positivity and optimism.

Voting is not rational choice

Pundits and the media continue to be puzzled by the popularity of Donald Trump. They point out that much of what he says isn’t true, that his plans lack content, that his comments about various subgroups are demeaning, and so on, and so on.

Underlying these plaintive comments is a fundamental misconception about how voters choose the candidate they will vote for. This has much more to do with standard human, in the first few seconds, judgements of character and personality than it does about calm, reasoned decision making.

Our analysis of previous presidential campaigns (about which I’ve posted earlier) makes it clear that this campaign is not fundamentally different in this respect. It’s always been the case that voters decide based on the person who appeals to them most on a deeper than rational level. As we discovered, the successful formula for winning is to be positive (Trump is good at this), not to be negative (Trump is poor at this), not to talk about policy (Trump is good at this), and not to talk about the opponent (Trump is poor at this). On the other hand, Hillary Clinton is poor at all four — she really, really believes in the rational voter.

We’ll see what happens in the election this week. But apart from the unusual facts of this presidential election, it’s easy to understand why Trump isn’t doing worse and Hillary Clinton isn’t doing better from the way they approach voters.

It’s not classified emails that are the problem

There’s been reporting that the email trove, belonging to Huma Abedin but found on the laptop of her ex-husband, got there as the result of automatic backups from her phone. This seems plausible; if it is true then it raises issues that go beyond whether any of the emails contain classified information or not.

First, it shows how difficult it is for ordinary people to understand, and realise, the consequences of their choices about configuring their life-containing devices. Backing up emails is good, but every user needs to understand what that means, and how potentially invasive it is.

Second, to work as a backup site, this laptop must have been Internet-facing and (apparently) unencrypted. That means that more than half a million email messages were readily accessible to any reasonably adept cybercriminal or nation-state. If there are indeed classified emails among them, then that’s a big problem.

But even if there are not, access to someone’s emails, given the existence of textual analytics tools, means that a rich picture can be built up of that individual: what they are thinking about, who they are communicating with (their ego network in the jargon), what the rhythm of their day is, where they are located physically, what their emotional state is like, and even how healthy they are.

For any of us, that kind of analysis would be quite invasive. But when the individual is a close confidante of the U.S. Secretary of State, and when many of the emails are from that same Secretary, the benefit of a picture of them at this level of detail is valuable, and could be exploited by an adversary.

Lawyers and the media gravitate to the classified information issue. This is a 20th Century view of the problems that revealing large amounts of personal text cause. The real issue is an order of magnitude more subtle, but also an order of magnitude more dangerous.

The real problem with the Clinton email server

Every intelligence person I’ve talked to has told me that the probability that the Russians and Chinese (at least) hacked Hillary Clinton’s email server is 100%.

While the question of whether any of the emails were classified, about to be classified, or should have been classified is interesting, the real risk created by the use of this server is that it provided a real-time look at the communications of the Secretary of State (and the people she was talking to).

Even the unclassified emails provided insight into the Secretary’s state of mind, plans, location, and intentions. Some of these might have been obvious; others would follow from examining email headers; and others by carrying out textual analysis (which is getting quite good at reverse engineering mental state, as regular readers will know).

Access to your entire email stream + some analytic capacity = fairly complete understanding of your life.

(Note that Google already does this for everyone who has a gmail account, and also for anyone who sends or receives email from anyone with a gmail account.)

Added 2016/05/06: A new problem now arises: control of the presidential election is in the hands of any country that can claim to have hacked the server. While hacking by a foreign power remains a (virtually certain) hypothetical, it is clearly having no impact on the election. But if a foreign power were to leak that they had hacked the server and exploited that somehow, the impact will surely be catastrophic. And I can imagine several of America’s enemies who might prefer a President Trump to a President Clinton II.

“It’s going to be really great”

Donald Trump continues to be the poster child for our election-winning-language  model: high positive language, as little negative language as possible, and appeals to policy goals without getting into details. The media and pundits are tearing their hair out because he refuses to talk about specifics but, as we predict, it’s working! (Interestingly, I went back and looked at Perot’s language in the 1992 election, and he had more or less the same patterns — and he led the party contenders in national polls for a period in 1992.)

What the media and pundits don’t realise is that incumbent presidents running for a second term use language very similar to Trump’s. It’s just that, with a first-term track record, it’s not as glaringly obvious, and they don’t notice.

Trump’s continuing success

As I posted earlier, our study of previous successful presidential candidates shows that success is very strongly correlated with a particular language model, consisting of:

  • Uniformly positive language
  • Complete absence of negative language
  • Using uplifting, aspirational metaphors rather than policy proposals, and
  • Ignoring the competing candidates

Trump presumably polls well, to a large extent, because he uses this language model (not so much ignoring of the competing candidates recently, but maybe that’s the effect of a primary). This language pattern tends to be used by incumbent presidents running for re-election, and seems to derive from their self-perception as already-successful in the job they’re re-applying for. Trump, similarly, possesses huge self confidence that seems to have the same effect — he perceives himself as (automatically, guaranteed) successful as president.

The dynamic between the successful self-perception issue and the competence issue was hard to separate before; and we’ve used ‘statesmanlike’ to describe the model of language of electoral success. All of the presidential incumbents whom we previously studied had a self-perception of success and a demonstrated competence and we assumed that both were necessary to deploy the required language comfortably and competently. Trump, however, shows that this isn’t so — it’s possible to possess the self-perception of success without the previously demonstrated competence. In Trump’s case, presumably, it is derived from competence in a rather different job: building a financial empire.

The media is in a frenzy about the competence issue for Trump. But our language model explains how it is possible to be popular among voters without demonstrating much competence, or even planned competence, to solve the problems of the day.

Voters don’t care about objective competence in the way that the media do. They care about the underlying personal self-confidence that is revealed in each candidate’s language. The data is very clear about this.

It may even be the rational view that a voter should take. Presidents encounter, in office, many issues that they had not previously formulated a policy for, so self-confidence may be more valuable than prepackaged plans. And voters have learned that most policies do not get implemented in office anyway.

It’s silly to treat Trump as a front runner when no actual vote has yet been cast. But it wouldn’t be surprising if he continues to do well for some time.  Of the other candidates, only Christie shows any sense of the use of positive language but, as a veteran politician, he cannot seem to avoid the need to present policies.

Results from the first Democratic debate

The debate held on Tuesday night pitted one well known figure (Hillary Clinton) against one up and coming figure (Sanders) and three others with no name recognition except among the wonkiest. The differences in exposure and preparation were obvious. I can’t see that it made any difference to anyone’s opinions.

But it remains interesting to see how well each person did at presenting a persona. Extremely well known politicians do not usually have the luxury of presenting themselves with a new, improved persona because the old one is so well known, so it’s common to find that persona deception scores are low for such candidates. For those who aren’t well-known, the strength of their persona is a blend of how well they can do it personally, and how big the gap is between their previous self-image and the persona that they are trying to project. A relatively unknown candidate with a high persona deception score, therefore, is likely to do well; one with a low score probably will not.

Here are the results from this debate:

deceptdocsThe red and greeen points represent artificial word use corresponding to moderately high amd moderately low levels of persona deception. Clinton, as expected (and from my analysis in the 2008 cycle) has low levels of persona deception. Sanders’s levels are in the mid-range. Chafee is sincere, but this won’t help him with his current level of recognition. O’Malley has the highest level of persona deception, which is a positive indicator for him (for what it’s worth in this crowd). Webb is also in the midrange, but his language use is quite different from that of Sanders.

Results from second Republican debate

Regular readers will know that, especially in a crowded marketplace, politicians try to stand out and attract votes by presenting themselves in the best possible light that they can. This is a form of deception, and carries the word-use signals associated with deception, so it can be measured using some straightforward linguistic analysis.

Generally speaking, the candidate who achieves the highest level of this persona deception wins, so candidates try as hard as they can. There are, however, a number of countervailing forces. First, different candidates have quite different levels of ability to put on this kind of persona (Bill Clinton excelled at it). Second, it seems to be quite exhausting, so that candidates have trouble maintaining it from day to day. Third, the difficulty depends on the magnitude of the difference between the previous role and the new one that is the target of a campaign: if a vice-president runs for president, he is necessarily lumbered with the persona that’s been on view in the previous job; if not, it’s easier to present a new persona and make it seem compelling (e.g. Obama in 2008). Outsiders therefore have a greater opportunity to re-invent themselves. Fourth, it depends on the content of what is said: a speech that’s about pie in the sky can easily present a new persona, while one that talks about a candidate’s track record cannot, because it drags the previous persona into at least the candidate’s mind.

Some kinds of preparation can help to improve the persona being presented — a good actor has to be able to do this. But politicians aren’t usually actors manqué so the levels of persona deception that they achieve from day to day emerge from their subconscious and so provide fine-grained insights into how they’re perceiving themselves.

The results from the second round of debates are shown in the figure:

deceptdocs

The red and green points represent artificial debate participants who use all of the words of the deception model at high frequency and low frequency respectively.

Most of the candidates fall into the band between these two extremes, with Rand Paul with the lowest level of persona deception (which is what you might expect). The highest levels of deception are Christie and Fiorina, who had obviously prepped extensively and were regarded as having done well; and Jindal, who is roughly at the same level, but via completely different word use.

Comparing these to the results from the first round of debates, there are two obvious changes: Trump has moved from being at the low end of the spectrum to being in the upper-middle; and Carson has moved from having very different language patterns from all of the other candidates to being quite similar to most of them. This suggests that both of them are learning to be better politicians (or being sucked into the political machine, depending on your point of view).

The candidates in the early debate have clustered together on the left hand side of the figure, showing that there was a different dynamic in the two different debates. This is an interesting datum about the strength of verbal mimicry.

The secret of Trump’s success

Looking at US presidential elections through the lens of empirical investigation of word use shows that there’s a pattern of language that is associated with electoral success. Those who use it win, and the difference in the intensity of the pattern correlates well with the margin of victory.

The effective pattern is, in a way, intuitive: use positive language, eliminate negative language completely, talk in the abstract rather than about specific policies, and pay no attention to the other candidates.

In other words, a successful candidate should appear “statesmanlike”.

Candidates find it extremely difficult to use this approach — they feel compelled to compare themselves to the other candidates, dragging in negativity, and to explain the cleverness of their policies. Only incumbent presidents, in our investigation, were able to use this language pattern reliably.

I listened to some of Trump’s speech in Texas last night, and I’ve come to see that the media are completely and utterly wrong about why he is doing so well in the polls. It’s not that he’s tapping into a vein of disaffection with the political system; it is that he’s using this language model. In previous cycles, it’s only been incumbent presidents who’ve had the self-confidence to use it, but Trump, of course, has enough self-confidence to start a retail business selling it.

Let’s look at the components of the model:

Positive language: Trump’s positivity is orders of magnitude above that of the other candidates, and in two ways. First, he is relentlessly positive about the U.S. and about the future (catchphrase: “we can do better”). Second, he’s positive about almost everyone he mentions (catchphrase: “he’s a great guy”).

Negative language: Trump doesn’t avoid negativity altogether, but he uses it cleverly. First, his individual negative targets are not the other candidates (by and large) but pundits — Karl Rove and George Will were mentioned last night, but I doubt if more than 1% of the audience could have identified either in a line-up; so this kind of negativity acts as a lightning rod, without making Trump seem mean. And the negative references to others lack the bitterness that often bleeds through in the negative comments of more typical candidates. Second, when he mentions negative aspects of the Obama administration and its policies and actions, he does it be implication and contrast (“that’s not what I would do”, “I could do better”).

Vision not policies: the media cannot stand that Trump doesn’t come out with detailed policy plans, but it’s been clear for a while that voters don’t pay a lot of attention to policies. They’ve learned that (a) there’s a huge gap between what a president can want to do and what he can actually make happen, and (b) policies are generated with one eye on the polls and focus groups, so they often aren’t something that the candidate has much invested in doing in the first place. [It’s incredible that Secretary Clinton ran focus groups to prep her “apology”, which was actually a meta-apology for not having apologized better or earlier.]

Trump has one huge “policy” advantage — he isn’t beholden to donors, and so is freer of the behind-the-scenes pressure that most candidates face. In the present climate, this has to be a huge selling point.

Ignore the other candidates: Trump doesn’t quite do this (and it gets him into trouble), But he’s learning fast — in last night’s speech, he only mentioned a handful of his competitors and his comments about all of them were positive.

If Trump continues to give this kind of speech, then the more exposure he gets, the more voters are going to like him. I remain doubtful that he will be the Republican nominee, but I don’t see him flaming out any time soon. Even if he makes some serious gaffe, he’ll apologize in seconds and move on (in contrast to Clinton who seems determined to make acute issues into chronic ones).

Republican candidates’ debate: persona deception results

Here are results from the first Republican debate, combining the early and prime-time material into a single corpus.

There’s more detail about the theory in the previous post, but the basic story is: an election campaign is a socially sanctioned exercise in deception; factual deception is completely discounted and so doesn’t matter, but the interesting question is the deception required of each candidate to present themselves as better than they really are; and the candidate who can implement this kind of deception best tends to be the winner. Note that, although deception often has negative connotations, there are many situations where it is considered appropriate, allowed, or condoned: negotiation, dating, selling and marketing — and campaigns are just a different kind of marketing. Sometimes this is called, in the political context, “spin” but it’s really more subtle than that.

The basic plot show the variation in level of deception, aggregated over all of the turns by each candidate during the debate. The line is the deception axis; the further towards the red end, the stronger the deception. Other variation is caused by variations in the use of different words of the model — different styles.

deceptdocs

These results aren’t terribly surprising. Both Fiorina and Huckabee have broad media experience and so are presumably good at presenting a facade appropriate to many different occasions (and no wonder Fiorina is widely regarded as having “won” the early debate). Trump has low levels of deception — that’s partly because he doesn’t bother with a facade, and partly because the more well-known a person is, the harder it is to successfully present a different facade.

Note, again unsurprisingly, that Carson, while in the middle of the pack on the deception axis, has quite different language patterns from any of the others. That’s partly opportunity — he wasn’t asked the same kind of questions — but partly not being a professional politician.

deceptdocszoomThis figure zooms in to show the structure of the pack in the centre. There isn’t a lot of difference, which reinforces the takeaway that these debates didn’t make a lot of different, positively or negatively, for most of the candidate.

The contributions of language to the ranking can be looked at by drilling down into this table:

wordpatternThe rows are candidates in alphabetical order (Fiorina 5, Huckabee 8, Perry 13, Trump 15), the columns are 42 of the words of the deception model that were actually used in decreasing order of overall frequency, and the blocks are darker in colour when a word used by a candidate makes a greater contribution to the model. The top words were: I, but,  going,  my,  me, or, go, take, look, lead, run, rather, without, move, and hate. So Huckabee’s high score comes primarily from low use of first-person singular pronouns, while Fiorina’s comes from heavier use of lower-ranked words that most others didn’t use. There are qualitative similarities between Fiorina’s language and Carson’s (row 2).

In previous presidential election campaigns, the candidate who managed to present the best facade in the strongest way was the winner.

A separate question is: what kind of facade should a candidate choose? We have empirical results about that too. A winning persona is characterised by: ignoring policy issues completely, ruthlessly eliminating all negative language, using plenty of positive language, and ignoring the competing candidates. Although, at one level, this seems obvious, no candidate and no campaign can bring themselves to do it until their second presidential campaign. But not only does it predict the winner, the margin of victory is also predictable from it as well.

Election winning language patterns

One of the Freakonomics books makes the point that, in football aka soccer, a reasonable strategy when facing a free kick is to stay in the middle of the goal rather than diving to one side or the other — but goaltenders hardly ever follow this strategy because they look like such fools if it doesn’t pay off. Better to dive, and look as if you tried, even if it turns out that you dived the wrong way.

We’ve been doing some work on what kind of language to use to win elections in the U.S. and there are some similarities between the strategy that works, and the goal tending strategy.

We looked at the language patterns of all of the candidates in U.S. presidential elections over the past 20 years, and a very clear language pattern for success emerged. Over all campaigns, the candidate who best deployed this language won, and the margin of victory relates quite strongly to how well the language was used (for example, Bush and Gore used this pattern at virtually identical levels in 2000).

What is this secret language pattern that guarantees success? It isn’t very surprising: high levels of positive words, non-existent levels of negative words, abstract words in preference to concrete ones, and complete absence of reference to the opposing candidate(s).

What was surprising is how this particular pattern of success was learned and used. Although the pattern itself isn’t especially surprising, no candidate used it from the start; they all began with much more conventional patterns: negativity, content-filled policy statements, and comparisons between themselves and the other candidates. With success came a change in language, but the second part of the surprise is that the change happened, in every case, over a period of little more than a month. For some presidents, it happened around the time of their inaugurations; for others around the time of their second campaign, but it was never a gradual learning curve. This suggests that what happens is not a conscious or unconscious improved understanding of what language works, but rather a change of their view of themselves that allows them to become more statesmanlike (good interpretation) or entitled (bad interpretation). The reason that presidents are almost always re-elected is that they use this language pattern well in their second campaigns. (It’s not a matter of changing speechwriting teams or changes in the world and so in the topics being talked about — it’s almost independent of content.)

So there’s plenty of evidence that using language like this leads to electoral success but, just as for goaltenders, no candidate can bring himself or herself to use it, because they’d feel so silly if it didn’t work and they lost.

You heard it here first

As I predicted on August 8th, Obama has won the U.S. presidential election. The prediction was made based on his higher levels of persona deception, that is the ability to present himself as better and more wonderful than he actually is. Romney developed this a lot during the campaign and the gap was closing, but it wasn’t enough.

On a side note, it’s been interesting to notice the emphasis in the media on factual deception, and the huge amount of fact checking that they love to do. As far as I can tell, factual deception has at best a tiny effect on political success, whether because it’s completely discounted or because the effect of persona is so much stronger. On the record, it seems to me to be a tough argument that Obama has been a successful president, and indeed I saw numerous interviews with voters who said as much — but then went on to say that they would still be voting for him. So I’m inclined to the latter explanation.

Including the results of the third debate

Just a quick update from the persona deception rankings from yesterday, to include the text of the third debate (assuming that each statement is free form, which is slightly dubious).

Here’s the figure:

Persona deception scores after the third debate

You can see that they are running neck and neck when it comes to persona deception. Adding in the third debate changes the semantic space because the amount of text is so large compared to a typical campaign speech. The points corresponding to debates lie in the middle of the pack suggesting that neither is trying to hard to present themselves as better than they are — this is probably typical of a real-time adversarial setting where there aren’t enough cognitive resources to get too fancy.

Update on persona deception in the US presidential election

Recall that persona deception is the attempt, by a politician, to seem more wonderful than s/he actually is. It’s a form of deception, and can be detected fairly readily using the Pennebaker deception model. As I mentioned in the previous post, it relies on the ability to speak in a freeform (i.e. unprompted) way. However, both of the presidential debates so far have used the questions only as faint stimulators of pre-prepared talking points so I’m including them (but some caveats apply).

Here is the picture of the levels of persona deception over time, where I’ve changed to a more conventional red for Romney and blue for Obama.

red – Romney; blue – Obama

Apart from a few high-scoring speeches by Romney, there isn’t much separation between the two candidates. The differentiating between top-left and bottom-right is mostly driven by Obama’s heavy use of “I’m” and one of two other words that Romney doesn’t use much. The debates are labelled by the squares — it’s clear that there isn’t much difference between their debate speeches and their stump speeches, which is interesting because the former are less scripted.

However, this is a big change from my previous analysis just after the conventions. At that point Obama’s levels of persona deception were much higher than Romney’s. The change suggests that Romney has become much better at presenting the current persona (or, alternatively, that the persona he is now presenting is closer to the “real” man). Since the candidate who can  best present a strong persona tends to win, this suggests that the candidates are much closer than they were.

We will see what the 3rd debate brings forth tonight…

Update of persona deception in the US presidential election to early September

These results are somewhat limited by the low number of Romney speeches that seem to be posted on the web; and do not include any of the convention speeches.
Recall that persona deception is the attempt by politicians to appear better than they are; that this is a form of deception detectable by the Pennebaker deception model; and that, all things being equal, the candidate who presents him/herself as most wonderful will win.
This figure shows the level of persona deception in speeches, where red is Obama and blue is Romney. The straight line is the axis of deception with high sores at the red end and low scores at the green end.

The pattern is very much as it was a few months ago; and, on this evidence, Obama is still on track to win.

Here’s a great example of high scoring Obama:

“…you are aware that there is a pretty intense campaign going on right now. And the reason it’s intense is because the choice that we face in November could not be bigger. It’s not just a choice between two candidates. It’s not even just a choice between two parties. More than any election in recent memory, this is a choice between two fundamentally different paths for our country’s future.”

Not a first-person singular pronoun to be seen; hardly an exclusive word. And it could have been said by anyone.

 

Differentiating from the other candidate

One of the puzzles of the early phases of the 2012 election campaign was how little the candidates managed to differentiate themselves from one another.

Campaigns are a situation where getting daylight between your candidate and the other guys seems like an essential (and preferably in a good way). But not only did the Republican contenders all tend to use similar words, but they all used similar words to Obama. There was some indication that each had a home ground to which they constantly returned, but it wasn’t different enough from everybody else to differentiate them, certainly not to a human audience. (I’m talking about aspects of this analysis at the Foundations of Open Source Intelligence at the end of the month in Istanbul — politicians acting as surrogates for other highly motivated, sophisticated, well-funded persuaders.)

Now that the campaign has become a two-person one, there is differentiation in the language use of the two candidates, shown here:

The blue crosses are Obama speeches and the red ones Romney speeches. There are clear differences.

So the next question is: do these differences result from differences of content or differences of style? This turns out to be hard to answer. If we pick out particular classes of words (nouns, verbs, adjectives) then there’s more of an overlap, but still a visible difference. For example, here is the equivalent plot for just the nouns, which you would imagine would primarily capture differences in content:

This rather suggests that a big part of the difference is what the candidates are talking about. But when you dig into the data, it turns out that the differentiating nouns are not big content-filled nouns, but little ordinary nouns where the differences are as much about habits and taste as they are about content.

Update — persona deception from May to early August in the US presidential race

If you’re a regular reader, you’ll know that I compute “persona deception” scores for political figures. These measure all kinds of deception but, in the political arena, most of the deception is about candidates portraying themselves as better, nicer, wiser, and more competent than they really are (rather than factual misstatements).

Now that the US presidential race is down to two, I’ve done the analysis on their available speeches from the beginning of May up to the present (early August). Obama has made many more speeches (I’ve included both ‘campaign’ and ‘fundraiser’ speeches — I don’t know how he’s found the time to do anything else since there are three and four speeches apparently most days).

Here is the basic figure:

The line is the axis of scores, with high scores at the red end and low scores at the green end. The red crosses are Obama speeches, and the blue crosses Romney speeches. You can see that Obama’s scores (for example, projecting each point onto the line) are much higher. It seems to be the case that, all things being equal, the candidate with the higher persona deception scores wins an election. If this data holds up through the remaining 3 months, this can be considered a prediction. That’s certainly what happened in the 2008 cycle, which you can see by looking back in this blog.

For the technically minded, the two-factor structure here is often seen faintly because an individual’s use of markers such as first-person singular pronouns is often fairly uncorrelated with their use of exclusive words such as “but” and “or”. It’s more pronounced in this case by Romney’s high rate of use of “I” while Obama tends to prefer “I’m”. Overall, Obama’s high scores come from: high rates of “I’m”, high rates of “go” and “going”, and low rates of “but” and “or”. If you want to find out more, this analysis is based on James Pennebaker’s deception model, which we’ve extended by using a dimensionality reduction (so that scores are projections onto a set of eigenvectors rather than sums of marker frequencies).