words | Finding Bad Guys in Data

Posts Tagged 'words'

Trump’s continuing success

Published December 30, 2015 Uncategorized Leave a Comment
Tags: Christie, election, language, politics, successful election language, textual analysis, Trump, US election, words

As I posted earlier, our study of previous successful presidential candidates shows that success is very strongly correlated with a particular language model, consisting of:

Uniformly positive language
Complete absence of negative language
Using uplifting, aspirational metaphors rather than policy proposals, and
Ignoring the competing candidates

Trump presumably polls well, to a large extent, because he uses this language model (not so much ignoring of the competing candidates recently, but maybe that’s the effect of a primary). This language pattern tends to be used by incumbent presidents running for re-election, and seems to derive from their self-perception as already-successful in the job they’re re-applying for. Trump, similarly, possesses huge self confidence that seems to have the same effect — he perceives himself as (automatically, guaranteed) successful as president.

The dynamic between the successful self-perception issue and the competence issue was hard to separate before; and we’ve used ‘statesmanlike’ to describe the model of language of electoral success. All of the presidential incumbents whom we previously studied had a self-perception of success and a demonstrated competence and we assumed that both were necessary to deploy the required language comfortably and competently. Trump, however, shows that this isn’t so — it’s possible to possess the self-perception of success without the previously demonstrated competence. In Trump’s case, presumably, it is derived from competence in a rather different job: building a financial empire.

The media is in a frenzy about the competence issue for Trump. But our language model explains how it is possible to be popular among voters without demonstrating much competence, or even planned competence, to solve the problems of the day.

Voters don’t care about objective competence in the way that the media do. They care about the underlying personal self-confidence that is revealed in each candidate’s language. The data is very clear about this.

It may even be the rational view that a voter should take. Presidents encounter, in office, many issues that they had not previously formulated a policy for, so self-confidence may be more valuable than prepackaged plans. And voters have learned that most policies do not get implemented in office anyway.

It’s silly to treat Trump as a front runner when no actual vote has yet been cast. But it wouldn’t be surprising if he continues to do well for some time. Of the other candidates, only Christie shows any sense of the use of positive language but, as a veteran politician, he cannot seem to avoid the need to present policies.

The secret of Trump’s success

Published September 15, 2015 Uncategorized Leave a Comment
Tags: election, election winning language, George Will, Karl Rove, language, politics, textual analysis, Trump. Hillary Clinton, US election, words

Looking at US presidential elections through the lens of empirical investigation of word use shows that there’s a pattern of language that is associated with electoral success. Those who use it win, and the difference in the intensity of the pattern correlates well with the margin of victory.

The effective pattern is, in a way, intuitive: use positive language, eliminate negative language completely, talk in the abstract rather than about specific policies, and pay no attention to the other candidates.

In other words, a successful candidate should appear “statesmanlike”.

Candidates find it extremely difficult to use this approach — they feel compelled to compare themselves to the other candidates, dragging in negativity, and to explain the cleverness of their policies. Only incumbent presidents, in our investigation, were able to use this language pattern reliably.

I listened to some of Trump’s speech in Texas last night, and I’ve come to see that the media are completely and utterly wrong about why he is doing so well in the polls. It’s not that he’s tapping into a vein of disaffection with the political system; it is that he’s using this language model. In previous cycles, it’s only been incumbent presidents who’ve had the self-confidence to use it, but Trump, of course, has enough self-confidence to start a retail business selling it.

Let’s look at the components of the model:

Positive language: Trump’s positivity is orders of magnitude above that of the other candidates, and in two ways. First, he is relentlessly positive about the U.S. and about the future (catchphrase: “we can do better”). Second, he’s positive about almost everyone he mentions (catchphrase: “he’s a great guy”).

Negative language: Trump doesn’t avoid negativity altogether, but he uses it cleverly. First, his individual negative targets are not the other candidates (by and large) but pundits — Karl Rove and George Will were mentioned last night, but I doubt if more than 1% of the audience could have identified either in a line-up; so this kind of negativity acts as a lightning rod, without making Trump seem mean. And the negative references to others lack the bitterness that often bleeds through in the negative comments of more typical candidates. Second, when he mentions negative aspects of the Obama administration and its policies and actions, he does it be implication and contrast (“that’s not what I would do”, “I could do better”).

Vision not policies: the media cannot stand that Trump doesn’t come out with detailed policy plans, but it’s been clear for a while that voters don’t pay a lot of attention to policies. They’ve learned that (a) there’s a huge gap between what a president can want to do and what he can actually make happen, and (b) policies are generated with one eye on the polls and focus groups, so they often aren’t something that the candidate has much invested in doing in the first place. [It’s incredible that Secretary Clinton ran focus groups to prep her “apology”, which was actually a meta-apology for not having apologized better or earlier.]

Trump has one huge “policy” advantage — he isn’t beholden to donors, and so is freer of the behind-the-scenes pressure that most candidates face. In the present climate, this has to be a huge selling point.

Ignore the other candidates: Trump doesn’t quite do this (and it gets him into trouble), But he’s learning fast — in last night’s speech, he only mentioned a handful of his competitors and his comments about all of them were positive.

If Trump continues to give this kind of speech, then the more exposure he gets, the more voters are going to like him. I remain doubtful that he will be the Republican nominee, but I don’t see him flaming out any time soon. Even if he makes some serious gaffe, he’ll apologize in seconds and move on (in contrast to Clinton who seems determined to make acute issues into chronic ones).

Canadian election 2015: Leaders’ debate

Published August 7, 2015 Uncategorized Leave a Comment
Tags: Canada, candidates, deception, election, election campaign, Elizabeth May, influence, Justin Trudeau, language, leaders' debate, Macleans, May, Mulcair, politics, spin, Stephen Harper, textual analysis, Thomas Mulcair, Trudeau, words

Regular readers will recall that I’m interested in elections as examples of the language and strategy of influence — what we learn can be applied to understanding jihadist propaganda.

The Canadian election has begun, and last night was the first English-language debate by the four party leaders: Stephen Harper, Elizabeth May, Thomas Mulcair, and Justin Trudeau. Party leaders do not get elected directly, so all four participants had trouble wrapping their minds around whether they were speaking as party spokespeople or as “presidential” candidates.

Deception is a critical part of election campaigns, but not in the way that people tend to think. Politicians make factual misstatements all the time, but it seems that voters have already baked this in to their assessments, and so candidates pay no penalty when they are caught making such statements. This is annoying to the media outlets that use fact checking to discover and point out factual misstatements, because nobody cares, and they can’t figure out why.

Politicians also try to present themselves as smarter, wiser, and generally more qualified for the position for which they’re running, and this is a much more important kind of deception. In a fundamental sense, this is what an election campaign is — a Great White Lie. Empirically, the candidate who is best at this kind of persona deception tends to win.

Therefore, measuring levels of deception is a good predictor of the outcome of an election. Recall that deception in text is signalled by (a) reduced use of first-person singular pronouns, (b) reduced use of so-called exclusive words (“but”, “or”) that introduce extra complexity, (c) increased use of action verbs, and (d) increased use of negative-emotion words. This model can be applied by counting the number of occurrences of these words, adding them up (with appropriate signs), and computing a score for each document. But it turns out to be much more effective to add a step that weights each word by how much it varies in the set of documents being considered, and computing this weighted score.

So, I’ve taken the statements by each of the four candidates last night, and put them together into four documents. Then I’ve applied this deception model to these four documents, and ranked the candidates by levels of deceptiveness (in this socially acceptable election-campaign meaning of deceptiveness).

This figure shows, in the columns, the intensity of the 35 model words that were actually used, in decreasing frequency order. The rows are the four leaders in alphabetical order: Harper, May, Mulcair, Trudeau; and the colours are the intensity of the use of each word by each leader. The top few words are: I, but, going, go, look, take, my, me, taking, or. But remember, a large positive value means a strong contribution of this word to deception, not necessarily a high frequency — so the brown bar in column 1 of May’s row indicates a strong contribution coming from the word “I”, which actually corresponds to low rates of “I”.

This figure shows a plot of the variation among the four leaders. The line is oriented from most deceptive to least deceptive; so deception increases from the upper right to the lower left.

Individuals appear in different places because of different patterns of word use. Each leader’s point can be projected onto this line to generate a (relative) deception score.

May appears at the most deceptive end of the spectrum. Trudeau and Harper appear at almost the same level, and Mulcair appears significantly lower. The black point represents an artificial document in which each word of the model is used at one standard deviation above neutral, so it represents a document that is quite deceptive.

You might conclude from this that May managed much higher levels of persona deception than the other candidates and so is destined to win. There are two reasons why her levels are high: she said much less than the other candidates and her results are distorted by the necessary normalizations; and she used “I” many fewer times than the others. Her interactions were often short as well, reducing the opportunities for some kinds of words to be used at all, notably the exclusive words.

Mulcair’s levels are relatively low because he took a couple of opportunities to talk autobiographically. This seems intutively to be a good strategy — appeal to voters with a human face — but unfortunately it tends not to work well. To say “I will implement a wonderful plan” invites the hearer to disbelieve that the speaker actually can; saying instead “We will implement a wonderful plan” makes the hearer’s disbelief harder because they have to eliminate more possibilities’ and saying “A wonderful plan will be implemented” makes it a bit harder still.

It’s hard to draw strong conclusions in the Canadian setting because elections aren’t as much about personalities. But it looks as if this leaders’ debate might have been a wash, with perhaps a slight downward nudge for Mulcair.

Election winning language patterns

Published November 3, 2014 Uncategorized Leave a Comment
Tags: candidate, election, language, patterns, politics, prediction, presidential elections, secret language, US election, winning elections, words

One of the Freakonomics books makes the point that, in football aka soccer, a reasonable strategy when facing a free kick is to stay in the middle of the goal rather than diving to one side or the other — but goaltenders hardly ever follow this strategy because they look like such fools if it doesn’t pay off. Better to dive, and look as if you tried, even if it turns out that you dived the wrong way.

We’ve been doing some work on what kind of language to use to win elections in the U.S. and there are some similarities between the strategy that works, and the goal tending strategy.

We looked at the language patterns of all of the candidates in U.S. presidential elections over the past 20 years, and a very clear language pattern for success emerged. Over all campaigns, the candidate who best deployed this language won, and the margin of victory relates quite strongly to how well the language was used (for example, Bush and Gore used this pattern at virtually identical levels in 2000).

What is this secret language pattern that guarantees success? It isn’t very surprising: high levels of positive words, non-existent levels of negative words, abstract words in preference to concrete ones, and complete absence of reference to the opposing candidate(s).

What was surprising is how this particular pattern of success was learned and used. Although the pattern itself isn’t especially surprising, no candidate used it from the start; they all began with much more conventional patterns: negativity, content-filled policy statements, and comparisons between themselves and the other candidates. With success came a change in language, but the second part of the surprise is that the change happened, in every case, over a period of little more than a month. For some presidents, it happened around the time of their inaugurations; for others around the time of their second campaign, but it was never a gradual learning curve. This suggests that what happens is not a conscious or unconscious improved understanding of what language works, but rather a change of their view of themselves that allows them to become more statesmanlike (good interpretation) or entitled (bad interpretation). The reason that presidents are almost always re-elected is that they use this language pattern well in their second campaigns. (It’s not a matter of changing speechwriting teams or changes in the world and so in the topics being talked about — it’s almost independent of content.)

So there’s plenty of evidence that using language like this leads to electoral success but, just as for goaltenders, no candidate can bring himself or herself to use it, because they’d feel so silly if it didn’t work and they lost.

Verbal mimicry isn’t verbal (well, not lexical anyway)

Published May 9, 2013 Uncategorized Leave a Comment
Tags: deception, interrogation, language, textual analysis, words

One of my students, Carolyn Lamb, has been looking at deception in interrogation settings.

The Pennebaker model of deception, as devoted readers will know, is robust only for freeform documents. Sadly, the settings in which deception is often most interesting tend to be dialogues (law enforcement, forensic) and it’s known that the model doesn’t extend in any straightforward way to such settings.

We started out with the idea that responses would be mixtures of language elicited by the words in a question and freeform language from the respondent, and developed a clever method to separate them. Sadly, it worked, but it didn’t help. When the effect of question language was removed from answers, the differences between deceptive and truthful responses decreased.

Digging a little deeper, we were able to show that the influence of words from the question must impact response language at a higher level (i.e. earlier in the answer construction process than simply the lexical). Those who are being deceptive respond in qualitatively different ways to prompting words than those being truthful. A paper about this has been accepted for the IEEE Intelligence and Security Informatics Conference in Seattle next month.

Part of the explanation seems to be mirror neurons. There’s a considerable body of work on language acquisition, and on responses to single words, that uses mirror neurons as a big part of the explanation; I haven’t seen anything at an intermediate level where these results fit.

There are some interesting practical applications for interrogators. One strategy would be to reduce the presence of prompting words (and do so consistently across all subjects) so that responses become closer to statements, and so closer to freeform. My impression from my acquaintance is that smarter law enforcement personnel already know this and act on it.

But our results also suggest a new strategy: increase the number of prompting words because that tends to increase the separation between the deceptive and the truthful. This needs a good understanding of what kinds of response words to look for (and, for most, this has to be done offline because we as humans are terrible at estimating rates of words in real-time, especially function words). But it could be very powerful.

You heard it here first

Published November 7, 2012 Uncategorized Leave a Comment
Tags: deception, election, language, Obama, politics, prediction, presidential election, romney, spin, textual analysis, US election, words

As I predicted on August 8th, Obama has won the U.S. presidential election. The prediction was made based on his higher levels of persona deception, that is the ability to present himself as better and more wonderful than he actually is. Romney developed this a lot during the campaign and the gap was closing, but it wasn’t enough.

On a side note, it’s been interesting to notice the emphasis in the media on factual deception, and the huge amount of fact checking that they love to do. As far as I can tell, factual deception has at best a tiny effect on political success, whether because it’s completely discounted or because the effect of persona is so much stronger. On the record, it seems to me to be a tough argument that Obama has been a successful president, and indeed I saw numerous interviews with voters who said as much — but then went on to say that they would still be voting for him. So I’m inclined to the latter explanation.

Including the results of the third debate

Published October 23, 2012 Uncategorized Leave a Comment
Tags: campaign speech, debate, deception, election, language, Obama, politics, romney, spin, textual analysis, US election, words

Just a quick update from the persona deception rankings from yesterday, to include the text of the third debate (assuming that each statement is free form, which is slightly dubious).

Here’s the figure:

Persona deception scores after the third debate

You can see that they are running neck and neck when it comes to persona deception. Adding in the third debate changes the semantic space because the amount of text is so large compared to a typical campaign speech. The points corresponding to debates lie in the middle of the pack suggesting that neither is trying to hard to present themselves as better than they are — this is probably typical of a real-time adversarial setting where there aren’t enough cognitive resources to get too fancy.

Update on persona deception in the US presidential election

Published October 22, 2012 Uncategorized Leave a Comment
Tags: debate, deception, election, language, Obama, politics, romney, spin, textual analysis, US election, words

Recall that persona deception is the attempt, by a politician, to seem more wonderful than s/he actually is. It’s a form of deception, and can be detected fairly readily using the Pennebaker deception model. As I mentioned in the previous post, it relies on the ability to speak in a freeform (i.e. unprompted) way. However, both of the presidential debates so far have used the questions only as faint stimulators of pre-prepared talking points so I’m including them (but some caveats apply).

Here is the picture of the levels of persona deception over time, where I’ve changed to a more conventional red for Romney and blue for Obama.

red – Romney; blue – Obama

Apart from a few high-scoring speeches by Romney, there isn’t much separation between the two candidates. The differentiating between top-left and bottom-right is mostly driven by Obama’s heavy use of “I’m” and one of two other words that Romney doesn’t use much. The debates are labelled by the squares — it’s clear that there isn’t much difference between their debate speeches and their stump speeches, which is interesting because the former are less scripted.

However, this is a big change from my previous analysis just after the conventions. At that point Obama’s levels of persona deception were much higher than Romney’s. The change suggests that Romney has become much better at presenting the current persona (or, alternatively, that the persona he is now presenting is closer to the “real” man). Since the candidate who can best present a strong persona tends to win, this suggests that the candidates are much closer than they were.

We will see what the 3rd debate brings forth tonight…

Deception in the US Presidential Debates

Published October 3, 2012 Uncategorized Leave a Comment
Tags: debate, deception, language, Obama, politics, presidential election, romney, spin, textual analysis, words

You might be wondering if I’m going to be posting scores for the levels of persona deception in this evening’s presidential debate (and subsequent ones).

There’s a problem: the deception model relies on the rates at which certain kinds of words occur. In question-and-answer situations such as interrogations, and debates, the language of the questions drives, to some extent, the language of the answer. So we can’t get a clean read on the level of persona deception of the respondent without factoring out that part of the response that doesn’t come, so to speak, from inside the respondent’s head.

We can’t do this factoring yet, although we are making some progress. One of my students has developed a technique for “correcting” the word frequencies in an answer to allow for the prompting effects of words in the question. For example, using “you” in a question tends, not surprisingly, to alter the rates of pronouns such as “I” in the answer. The problem is complicated by the fact that the effects of the prompting don’t seem to be independent of the mental state of the respondent, something that others have noticed in forensic settings.

So the bottom line is that the deception model, thought effective in freeform situations such as speeches, remains problematic in interrogatory settings. The effect of a question seems to die away about 50 words into an answer, so there will be opportunities to look at levels of persona deception in longer responses, of which there will probably not be a shortage.

Differentiating from the other candidate

Published August 9, 2012 Uncategorized Leave a Comment
Tags: content, election, language, Obama, open source intelligence, persuaders, politics, republican contenders, romney, style, textual analysis, US election, words

One of the puzzles of the early phases of the 2012 election campaign was how little the candidates managed to differentiate themselves from one another.

Campaigns are a situation where getting daylight between your candidate and the other guys seems like an essential (and preferably in a good way). But not only did the Republican contenders all tend to use similar words, but they all used similar words to Obama. There was some indication that each had a home ground to which they constantly returned, but it wasn’t different enough from everybody else to differentiate them, certainly not to a human audience. (I’m talking about aspects of this analysis at the Foundations of Open Source Intelligence at the end of the month in Istanbul — politicians acting as surrogates for other highly motivated, sophisticated, well-funded persuaders.)

Now that the campaign has become a two-person one, there is differentiation in the language use of the two candidates, shown here:

The blue crosses are Obama speeches and the red ones Romney speeches. There are clear differences.

So the next question is: do these differences result from differences of content or differences of style? This turns out to be hard to answer. If we pick out particular classes of words (nouns, verbs, adjectives) then there’s more of an overlap, but still a visible difference. For example, here is the equivalent plot for just the nouns, which you would imagine would primarily capture differences in content:

This rather suggests that a big part of the difference is what the candidates are talking about. But when you dig into the data, it turns out that the differentiating nouns are not big content-filled nouns, but little ordinary nouns where the differences are as much about habits and taste as they are about content.

Update — persona deception from May to early August in the US presidential race

Published August 8, 2012 Uncategorized Leave a Comment
Tags: deception, election, language, Obama, romney, spin, textual analysis, US election, words

If you’re a regular reader, you’ll know that I compute “persona deception” scores for political figures. These measure all kinds of deception but, in the political arena, most of the deception is about candidates portraying themselves as better, nicer, wiser, and more competent than they really are (rather than factual misstatements).

Now that the US presidential race is down to two, I’ve done the analysis on their available speeches from the beginning of May up to the present (early August). Obama has made many more speeches (I’ve included both ‘campaign’ and ‘fundraiser’ speeches — I don’t know how he’s found the time to do anything else since there are three and four speeches apparently most days).

Here is the basic figure:

The line is the axis of scores, with high scores at the red end and low scores at the green end. The red crosses are Obama speeches, and the blue crosses Romney speeches. You can see that Obama’s scores (for example, projecting each point onto the line) are much higher. It seems to be the case that, all things being equal, the candidate with the higher persona deception scores wins an election. If this data holds up through the remaining 3 months, this can be considered a prediction. That’s certainly what happened in the 2008 cycle, which you can see by looking back in this blog.

For the technically minded, the two-factor structure here is often seen faintly because an individual’s use of markers such as first-person singular pronouns is often fairly uncorrelated with their use of exclusive words such as “but” and “or”. It’s more pronounced in this case by Romney’s high rate of use of “I” while Obama tends to prefer “I’m”. Overall, Obama’s high scores come from: high rates of “I’m”, high rates of “go” and “going”, and low rates of “but” and “or”. If you want to find out more, this analysis is based on James Pennebaker’s deception model, which we’ve extended by using a dimensionality reduction (so that scores are projections onto a set of eigenvectors rather than sums of marker frequencies).

Presidential speech word patterns

Published April 24, 2012 Uncategorized Leave a Comment
Tags: Clinton, election, Gingrich, language, McCain, Obama, Paul, politics, romney, Santorum, textual analysis, US election, words

In the continuing saga of presidential campaign speech language, I’ve been analyzing parts of speech that don’t get much attention such as verbs, adverbs, and adjectives. Looking at the way in which each candidate uses such words over time turns up some interesting patterns. I don’t understand their deep significance, but there’s some work suggesting that variability in writing is a sign of health; and Ashby’s Law of requisite variety can be interpreted to mean that the actor in a system with the most available options tends to control the system.

Here are the plots of adjective use (in a common framework) for the 2008 and 2012 candidates (up to the time that Santorum dropped out of the race).

It’s striking how much the patterns over time form a kind of spiral, moving from one particular combination of adjectives to another and another and eventually back to the original pattern. The exception is Obama who displays a much more radial structure, with an adjective combination that he uses a lot, and occasional deviations to something else, but a rapid return to his “home ground”.

You can see (the extremal set of) adjectives and their relationships in this figure:

You can see that they form 3 poles: on the left, adjectives associated with energy policy; at the bottom, adjectives associated with patriotism; and on the right, adjectives associated with defence [yes, it is spelled that way]. This figure can be overlaid on those of the candidates to get a sense of which poles they are visiting. For example, Obama’s “home ground” is largely associated with the energy-related adjectives.

Comparing content in the US presidential campaign 2008 vs 2012

Published March 31, 2012 Uncategorized Leave a Comment
Tags: Clinton, content, election, Gingrich, language, McCain, nouns, Obama, Paul, politics, romney, Santorum, textual analysis, words

I posted about the content in the 2012 presidential campaign speeches. It’s still relatively early in the campaign so comparisons aren’t necessarily going to reveal a lot, but I went back and looked at the speeches in 2008 by Hillary Clinton, McCain, and Obama; and compared them to the four remaining Republican contenders and President Obama so far this year.

Here’s the result of looking just at the nouns:

The key is: Clinton — magenta circles; Obama 2008 — red circles, McCain — light blue stars;

Gingrich — green circles; Paul — yellow circles; Romney — blue circles; Santorum — black circles; Obama 2012 — red squares.

Recall that the way to interpret these plots is that points far from the origin are more interesting speeches (in the sense that they use more variable word patterns) while different directions represent different “themes” in the words used.

The most obvious difference is that the topics talked about were much more wide-ranging in 2008 than they have been this year. This may be partly because of the early stage of the campaign, the long Republican primary season keeping those candidates focused on a narrow range of topics aimed at the base, or a change in the world that has focused our collective attention on different, and fewer, topics.

This can be teased out a bit by looking at the words that are associated with each direction and distance. The next figure shows the nouns that were actually used (only those that are substantially above the median level of interestingness are labelled):

You can see that there are four “poles” or topics that differentiate the speech content. To the right are words associated with the economy, but from a consumer perspective. At the bottom are words associated with energy. To the left are actually two groups of words, although they interleave a little. At the lower end are words associated with terrorism and the associated wars and threats. At the upper end are words associated with the human side of war and patriotism.

These two figures can be lined up with each other to get a sense of which candidates are talking about which topics. The 2012 speeches and Obama’s 2008 speeches all lean heavily towards the economic words. In 2008, McCain and Clinton largely talked about the war/security issues, with a slight bias by Clinton towards the patriotism cluster.

Obama’s 2012 speeches tend towards the energy cluster but, at this point, quite weakly given the overall constellation of topics and candidates.

The other thing that is noticeable is how similar the topics for some of the Republican contenders are: their speeches cluster quite tightly.

Negative words in the campaign

Published March 21, 2012 Uncategorized Leave a Comment
Tags: Gingrich, language, Obama, Paul, politics, romney, Santorum, spin, textual analysis, US election, words

Yesterday we looked at the use of positive words in the campaign. Today, I want to present the use of negative words.

We saw the President Obama is much better at using positive words than the Republican contenders; but they are all about the same at using negative words. Note that these two flavors of words are not necessarily opposites; someone can use both positive and negative words at high rates (although that itself might be interesting).

Here are the speeches according to their patterns of negative word use:

Again, distance from the origin indicates intensity of negative word use, and direction indicates different words being used.

Romney has the strongest use of negative words (and the associated words are ones like “disappointments” and “worrying”). Ron Paul also has quite strong use of negative words. His word choices are quite different from those of the other candidates, though; they include “bankrupt”, “flawed” and “inconvenient”.

President Obama and Gingrich have moderate levels of negative word use; the most popular word for both of them is “problem”, followed by “challenge”.

Santorum has the lowest levels of negative word use of all five of them.

The differences are interesting because they shed some light on how each candidate views those aspects of the situation that are not favorable to them. Obama and Gingrich have a more proactive view: negatives to them are problems. The other candidates have a more outward focus on the source of difficulties and, at the same time, a more negative inward focus, that is they use negative words that reflect how they feel about themselves.

I also ran an experiment weighting the positive words positively and the negative words negatively, to see if there is any ranking from, as it were, most positive person to most negative person. It turns out that there isn’t such a ranking. All of them use mixtures of positive and negative words, different mixtures for each, but all of about the same ratio of positivity to negativity.

Positive words in the campaign

Published March 20, 2012 Uncategorized Leave a Comment
Tags: Gingrich, language, Obama, Paul, politics, romney, Santorum, textual analysis, US election, words

Yesterday I posted about the content of the speeches of the campaigners for the 2012 presidential election cycle: the Republican contenders and President Obama. Today I have similar results for the use of positive words.

Here are the speeches:

The figure should be interpreted like this: distance from the origin indicates intensity of positive word use; direction indicates the use of a different set of positive words. So President Obama is much more positive than the Republican contenders, of which Gingrich is noticeably more positive than the rest. These are only based on the use of positive words so a placement close to the origin should be interpreted as the absence of positive words, not any kind of negativity (stay tuned). In other words, speeches near the origin are not positive (they could be either neutral or negative but this analysis can’t differentiate).

Some of the positive words associated with President Obama are: “profitable”, “creative”, “efficiency” and “outstanding”.

Some of the positive words associated with Gingrich are: “tremendous”, “optimistic”, “gains”, “happiness”, and “positive” itself.

You can see why the Republican approval numbers are dropping — people pick up on the tone of speeches, and they are attracted to positive language — which they aren’t getting. Even Gingrich’s positive words are mostly about the improvement (perceived) in his chances, not in the wider US situation.

2012 US Election, Republicans plus President Obama

Published March 15, 2012 Uncategorized Leave a Comment
Tags: AIPAC, deception, election, Gingrich, language, Obama, Paul, politics, romney, Santorum, spin, textual analysis, US election, words

Yesterday I posted details about the levels of persona deception in the speeches by the Republican candidates since the beginning of 2012. In striking contrast to the 2008 cycle, the speeches fall along a single axis, indicating widespread commonalities in the way that they use words, particularly the words of the deception model.

Today I’ve included President Obama’s speeches this year in the mix. I’ve tried to select only those speeches where there was an audience. Of course, for a sitting president, the distinction between an ordinary speech and a campaign speech is difficult to draw. Almost all of these are labelled as campaign events at whitehouse.gov.

Here is the plots of the persona deception levels, with Obama’s speeches added in magenta.

Generally speaking, Obama’s levels of persona deception (see yesterday’s post to be clear on what this means) are in the low range compared to the Republican presidential candidates. This is quite different from what happened in the 2008 cycle, where his levels were almost always well above those of McCain and Clinton. It’s not altogether surprising, though. First, he can no longer be the mirror in which voters see what they want to see since he has a substantial and visible track record. Second, he doesn’t have to try as hard to project a persona (at least at this stage of the campaign) since he has no competitor. I expect that his values will climb as the campaign progresses, particularly after the Republican nominee becomes an actual person and not a potential one.

The interesting point is the outlier at the top left of the figure. This is Obama’s speech to AIPAC. Clearly this is not really a campaign speech, so the language might be expected to be different. On the other hand, if it were projected onto the single-factor line formed by the other speeches, it would be much more towards the deceptive end of that axis. Since the underlying model detects all kinds of deception, not just that associated with persona deception in campaigns, this may be revealing of the attitude of the administration to the content expressed in this speech.

Republican presidential candidates — first analysis of persona deception

Published March 14, 2012 Uncategorized Leave a Comment
Tags: deception, Gingrich, language, Paul, politics, romney, Santorum, spin, textual analysis, US election, words

Regular readers of this blog will know that I carried out extensive analysis of the speeches of the contenders in the 2008 US presidential election cycle (see earlier postings). I’m now beginning similar analysis for the 2012 cycle, concentrating on the Republican contenders for now.

You will recall that Pennebaker’s deception model enables a set of documents to be ranked in order of their deceptiveness, detected via changes in the frequency of occurrence of 86 words in four categories: first-person singular pronouns, exclusive words, negative-emotion words, and action verbs. Words in the first two categories decrease in the presence of deception, while those in the last two categories increase. The model only allows for ranking, rather than true/false determination, because “increase” and “decrease” are always relative to some norm for the set of documents being considered.

How does this apply to politics? First of all, the point isn’t to detect when a politician is lying (Cynical joke: Q: How do you tell when a politician is lying? A: His lips are moving). Politicians tell factual lies, but this seems to have no impact on how voters perceive them, perhaps because we’re come to expect it. Rather, the kind of deception that is interesting is the kind where a politician is trying to present him/herself as a much better person (smarter, wiser, more competent) than they really are. This is what politicians do all the time.

Why should we care? There are two reasons. The first is that it works — typically the politician who is able to deliver the highest level of what we call “persona deception” gets elected. Voters have to decide on the basis of something, and this kind of presentation as a great individual seems to play more of a role than, say, actual plans for action.

Second, though, watching the changes in the levels of persona deception gives us a window into how each candidate (and campaign) is perceiving themselves (and, it turns out, their rivals) from day to day. Constructing and maintaining an artificial persona is difficult and expensive. Levels of persona deception tend to drop sharply when a candidate becomes confident that they’re doing well; and when some issue surfaces about which they don’t really have a persona opinion because, apparently, it takes time to construct the new piece.

So, with that preliminary, on to some results.

The figure shows the speeches in a space where speeches with greater person deception (spin) are further to the right, and those with less persona deception are further to the left. Ron Paul shows the lowest level of persona deception which is not surprising — nobody has ever accused him of trying to be what he is not. In contrast, Romney shows the highest level of persona deception — again not surprising as he has had to try hardest to make himself appealing to voters. Note that this also predicts that he will do well. Both Gingrich and Santorum occupy the middle ground; both are running on a very overt track record and are not trying as hard to make themselves seem different from who they are. Indeed, candidates with a strong history tend to have lower levels of persona deception simply because it’s very difficult to construct a new, more attractive persona when you already have a strong one. (The two points vertically separated from the rest are the result of a sudden burst of using “I’d” in these two speeches.)

The following figures break out the temporal patterns for the four candidates:

What’s striking about Romney is how much the level of persona deception changes from speech to speech. In the last election cycle, this wasn’t associated with audience type or recent success but seemed to be much more internally driven. This zig-zag pattern is much more the norm than a constant level of persona deception — some mystery remains.

Metaphors and counterterrorism

Published June 6, 2011 Uncategorized Leave a Comment
Tags: dual representation, language, metaphor, simile, textual analysis, words

The Intelligence Advanced Research ProjectsActivity (IARPA) has a call out for proposals to develop a system that will extract metaphors from text. The assumption is that the metaphors that are used in a document, or a community, reflect a way of viewing and organizing the world that can provide a higher-level way to understand other (sub)cultures. This seems like a very difficult challenge, which is exactly what these funding agencies derived from DARPA are supposed to do.

I remember reading a paper that Charles Williams presented to the Inklings (the Oxford society that included C.S. Lewis, Tolkien, and other high fliers) in which he talked about just how difficult it is to understand what a metaphor does (I haven’t been able to find either paper or reference). Similes are (by comparison) straightforward; when we say “A is like B” we draw attention to or highlight some aspect of B that is similar to that of A, and therefore emphasize some aspect of A, perhaps one that isn’t obvious.

A metaphor is a much more difficult object. When we say “A is B” we could take the view that this is just a more obscure kind of simile, in which the reader/hearer is invited to conceive of the possible similarity without a hint from the writer/speaker. But Williams argues, and I agree, that more is going on here. For a start, metaphors are not symmetric: if I say “A is B” it’s often nonsense to say “B is A” whereas similes usually are symmetric. Often there is no obvious and straightforward way to reduce a metaphor to a simile, that is there is no small set of properties common to A and B. And yet metaphors can be powerful.

There is a little relevant work in psychology, most of it associated with Judy DeLoache and what’s called the Dual Representation Hypothesis. Roughly speaking, the idea is that brains are well-equipped to represent symbols and the things they denote and to map computations on the symbols to computations on the denoted things in usable ways (apologies to psychologists for this mangled and computational perspective). This goes some way to explain abstract reasoning, with some very nice experiments with young children showing when various levels of sophistication kick in; but it might also provide some explanatory power for metaphors. Unfortunately, there is some evidence that the more black-box the symbol, the more usable it is, which is evidence against this being a useful explanation for metaphors.

I won’t be applying for funding to work on this — but I’ll be watching the results with interest.

And Williams’ conclusion — that metaphors are something like a legal fiction; which I didn’t find very convincing at the time I read the article and still don’t.

Deception scores for the UEA emails

Published January 26, 2010 Uncategorized Leave a Comment
Tags: climate, deception, language, politics, textual analysis, words

I’ve also calculated the deception scores for the UEA “climategate” emails, using the same methodology that I’ve written about in the context of the speeches of presidential candidates.

This doesn’t (yet) give any great results. This is partly because deception scores can only be computed for sets of similar documents. The UEA emails, however, fall into two broad classes: simple emails, and discussions and suggestions about more formal documents (papers and grant proposals). The language in these two classes is quite different, which makes them difficult to compare. For example, the base rates of first-person singular pronouns are very different.

What I have done is to see whether there are any patterns in deception scores with time. A strong change in either class of email should be detectable as a variation of score with time, which might be visible. The result is shown below, with the deception score axis running from right (low) to left (high), and the markers getting lighter with the passage of time.

Deception scores of UEA emails

The only thing that strikes me so far is that many emails with low deception scores are older in time. This might be taken to indicate some kind of change in the language patterns of these email users.

The released emails are a small and not very random set of all of the emails sent by these individuals. So not too much should be read into this plot.

“I” versus “we”

Published February 26, 2008 Uncategorized Leave a Comment
Tags: Clinton, deception, McCain, Obama, politics, pronouns, spin, words

A number of people have noticed that there are substantial differences in the way Hillary Clinton and Barack Obama use pronouns. This is part, but not the whole story about deceptiveness and spin in communication, as I’ve talked about in earlier postings.

The conventional explanation goes something like this:

“Clinton says “I” a lot because she’s egotistical, or because she plans to get things done, with or without the help of other people. Obama says “we” a lot because he is inclusive and wants to develop a shared dream that all can be involved in.”

This view is completely wrong, although it’s probably true that Obama, at some time, developed his use of “we” because he thought or was told that it would create an inclusive impression.

The reality is almost exactly the opposite. People who use first-person singular pronouns (“I”, “me”, “my”) do so, unconsciously, because they are being open, warm, and low status.

People who use first-person plural pronouns (“we”,”us”,”ours”) are not being as open, especially if the speaker is a man. Men use such pronouns as a velvet glove around an iron fist, a way to command without the appearance of command.

How do I know this? The models of language have been derived empirically, much of it the work of James Pennebaker and his group. They have studied the language that people use in known situations, and derived word usage profiles that have enabled them to draw these kinds of conclusions.

Obama uses the language he does because it works — people do perceive him as inclusive. But that doesn’t mean that that’s his real view of the situation. Clinton has adopted, consciously or not, an approach that presents her real self much more directly

It’s not all about pronouns. Other important components are exclusive words (“but”,”nor” which both McCain and Clinton have used in the past week!); negative emotion words (“angry”); and action verbs (“going”).

Finding Bad Guys in Data