Finding deception in text

When people speak or write deceptively, there are characteristic ways in which their word usage changes. My work has been built on a model of deception developed by James Pennebaker at the University of Texas, Austin.

Pennebaker’s model was developed empirically, by asking people to write in support of ideas that they did and didn’t believe in. As a result, it works best when applied to statements where the speaker or author is free to express him- or herself without constraints. In other situations, for example question and answer, testimony, business writing, and journalism, the model needs modification. For example, in the presidential debates, it’s much easier to detect how much deceptiveness (spin) is happening in the opening statements than in the responses to questions.

Deception results in:

  • Reduced frequency of first-person singular pronouns, perhaps as a way of decreasing distance or ownership of the content. In contrast, increased frequency of first-person pronouns signals warmth.
  • Reduced use of exclusive words, words like “but”, “or”, “however”, and “whereas”. These words introduce parts of sentences tha modify the rest and so are markers for increased complexity. Those who are being deceptive are, to some extent, controlling the content in a way that doesn’t happen in completely spontaneous speech, and this take cognitive resources. The extra processing required seems to come at the expense of the complexity of the content.
  • Increased use of negative emotion words such as “anger”. This seems to be because of some sense of societal disapproval of deception that colours the speaker’s mental landscape.
  • Increased use of motion verbs. This may be because of an attempt to keep the narrative moving and so distract from any inadvertent flaws; or it may be a side-effect of decreasing the overall complexity. Verbs like “go” and “going” are particularly important in this class.

We are not good at controlling our use of these classes of words, nor are we any good at noticing their frequencies, either hearing or reading. That’s mostly why the existence of this side-channel of information about a speaker’s mental state has not been noticed until recently.

Computer programs are, of course, very good at computing frequencies and so they can be used to compute deception scores for chunks of text very easily.


