Posts Tagged 'textual analysis'

Advances in Social Network Analysis and Mining Conference — Sydney

This conference will be in Sydney in 2017, from 31st July to 3rd August.

As well as the main conference, there is also a workshop, FOSINT: Foundations of Open Source Intelligence, which may be of even more direct interest for readers of this blog.

Also I will be giving a tutorial on Adversarial Analytics as part of the conference.

6.5/7 US presidential elections predicted from language use

I couldn’t do a formal analysis of Trump/Clinton language because Trump didn’t put his speeches online — indeed many of them weren’t scripted. But, as I posted recently, his language was clearly closer to our model of how to win elections than Clinton’s was.

So since 1992, the language model has correctly predicted the outcome, except for 2000 when the model predicted a very slight advantage for Gore over Bush (which is sort of what happened).

People judge candidates on who they seem to be as a person, a large part of which is transmitted by the language they use. Negative and demeaning statements obviously affect this, but so does positivity and optimism.

Voting is not rational choice

Pundits and the media continue to be puzzled by the popularity of Donald Trump. They point out that much of what he says isn’t true, that his plans lack content, that his comments about various subgroups are demeaning, and so on, and so on.

Underlying these plaintive comments is a fundamental misconception about how voters choose the candidate they will vote for. This has much more to do with standard human, in the first few seconds, judgements of character and personality than it does about calm, reasoned decision making.

Our analysis of previous presidential campaigns (about which I’ve posted earlier) makes it clear that this campaign is not fundamentally different in this respect. It’s always been the case that voters decide based on the person who appeals to them most on a deeper than rational level. As we discovered, the successful formula for winning is to be positive (Trump is good at this), not to be negative (Trump is poor at this), not to talk about policy (Trump is good at this), and not to talk about the opponent (Trump is poor at this). On the other hand, Hillary Clinton is poor at all four — she really, really believes in the rational voter.

We’ll see what happens in the election this week. But apart from the unusual facts of this presidential election, it’s easy to understand why Trump isn’t doing worse and Hillary Clinton isn’t doing better from the way they approach voters.

It’s not classified emails that are the problem

There’s been reporting that the email trove, belonging to Huma Abedin but found on the laptop of her ex-husband, got there as the result of automatic backups from her phone. This seems plausible; if it is true then it raises issues that go beyond whether any of the emails contain classified information or not.

First, it shows how difficult it is for ordinary people to understand, and realise, the consequences of their choices about configuring their life-containing devices. Backing up emails is good, but every user needs to understand what that means, and how potentially invasive it is.

Second, to work as a backup site, this laptop must have been Internet-facing and (apparently) unencrypted. That means that more than half a million email messages were readily accessible to any reasonably adept cybercriminal or nation-state. If there are indeed classified emails among them, then that’s a big problem.

But even if there are not, access to someone’s emails, given the existence of textual analytics tools, means that a rich picture can be built up of that individual: what they are thinking about, who they are communicating with (their ego network in the jargon), what the rhythm of their day is, where they are located physically, what their emotional state is like, and even how healthy they are.

For any of us, that kind of analysis would be quite invasive. But when the individual is a close confidante of the U.S. Secretary of State, and when many of the emails are from that same Secretary, the benefit of a picture of them at this level of detail is valuable, and could be exploited by an adversary.

Lawyers and the media gravitate to the classified information issue. This is a 20th Century view of the problems that revealing large amounts of personal text cause. The real issue is an order of magnitude more subtle, but also an order of magnitude more dangerous.

The real problem with the Clinton email server

Every intelligence person I’ve talked to has told me that the probability that the Russians and Chinese (at least) hacked Hillary Clinton’s email server is 100%.

While the question of whether any of the emails were classified, about to be classified, or should have been classified is interesting, the real risk created by the use of this server is that it provided a real-time look at the communications of the Secretary of State (and the people she was talking to).

Even the unclassified emails provided insight into the Secretary’s state of mind, plans, location, and intentions. Some of these might have been obvious; others would follow from examining email headers; and others by carrying out textual analysis (which is getting quite good at reverse engineering mental state, as regular readers will know).

Access to your entire email stream + some analytic capacity = fairly complete understanding of your life.

(Note that Google already does this for everyone who has a gmail account, and also for anyone who sends or receives email from anyone with a gmail account.)

Added 2016/05/06: A new problem now arises: control of the presidential election is in the hands of any country that can claim to have hacked the server. While hacking by a foreign power remains a (virtually certain) hypothetical, it is clearly having no impact on the election. But if a foreign power were to leak that they had hacked the server and exploited that somehow, the impact will surely be catastrophic. And I can imagine several of America’s enemies who might prefer a President Trump to a President Clinton II.

Trump’s continuing success

As I posted earlier, our study of previous successful presidential candidates shows that success is very strongly correlated with a particular language model, consisting of:

  • Uniformly positive language
  • Complete absence of negative language
  • Using uplifting, aspirational metaphors rather than policy proposals, and
  • Ignoring the competing candidates

Trump presumably polls well, to a large extent, because he uses this language model (not so much ignoring of the competing candidates recently, but maybe that’s the effect of a primary). This language pattern tends to be used by incumbent presidents running for re-election, and seems to derive from their self-perception as already-successful in the job they’re re-applying for. Trump, similarly, possesses huge self confidence that seems to have the same effect — he perceives himself as (automatically, guaranteed) successful as president.

The dynamic between the successful self-perception issue and the competence issue was hard to separate before; and we’ve used ‘statesmanlike’ to describe the model of language of electoral success. All of the presidential incumbents whom we previously studied had a self-perception of success and a demonstrated competence and we assumed that both were necessary to deploy the required language comfortably and competently. Trump, however, shows that this isn’t so — it’s possible to possess the self-perception of success without the previously demonstrated competence. In Trump’s case, presumably, it is derived from competence in a rather different job: building a financial empire.

The media is in a frenzy about the competence issue for Trump. But our language model explains how it is possible to be popular among voters without demonstrating much competence, or even planned competence, to solve the problems of the day.

Voters don’t care about objective competence in the way that the media do. They care about the underlying personal self-confidence that is revealed in each candidate’s language. The data is very clear about this.

It may even be the rational view that a voter should take. Presidents encounter, in office, many issues that they had not previously formulated a policy for, so self-confidence may be more valuable than prepackaged plans. And voters have learned that most policies do not get implemented in office anyway.

It’s silly to treat Trump as a front runner when no actual vote has yet been cast. But it wouldn’t be surprising if he continues to do well for some time.  Of the other candidates, only Christie shows any sense of the use of positive language but, as a veteran politician, he cannot seem to avoid the need to present policies.

Results from the first Democratic debate

The debate held on Tuesday night pitted one well known figure (Hillary Clinton) against one up and coming figure (Sanders) and three others with no name recognition except among the wonkiest. The differences in exposure and preparation were obvious. I can’t see that it made any difference to anyone’s opinions.

But it remains interesting to see how well each person did at presenting a persona. Extremely well known politicians do not usually have the luxury of presenting themselves with a new, improved persona because the old one is so well known, so it’s common to find that persona deception scores are low for such candidates. For those who aren’t well-known, the strength of their persona is a blend of how well they can do it personally, and how big the gap is between their previous self-image and the persona that they are trying to project. A relatively unknown candidate with a high persona deception score, therefore, is likely to do well; one with a low score probably will not.

Here are the results from this debate:

deceptdocsThe red and greeen points represent artificial word use corresponding to moderately high amd moderately low levels of persona deception. Clinton, as expected (and from my analysis in the 2008 cycle) has low levels of persona deception. Sanders’s levels are in the mid-range. Chafee is sincere, but this won’t help him with his current level of recognition. O’Malley has the highest level of persona deception, which is a positive indicator for him (for what it’s worth in this crowd). Webb is also in the midrange, but his language use is quite different from that of Sanders.