Posts Tagged 'language'

Democratic debates strategy

In an analysis of the language used by US presidential candidates in the last 7 elections, Christian Leuprecht and I showed that there’s a language pattern that predicts the winner, and even the margin. The pattern is this: use lots of positive language, use no negative language at all (even words like ‘don’t’ and won’t’), talk about abstractions not policy, and don’t talk about your opponent(s). (For example, Trump failed on the fourth point, but was good on the others, while Hillary Clinton did poorly on all four.)

In some ways, this pattern is intuitive: voters don’t make rational choices of the most qualified candidate — they vote for someone they relate to.

Why don’t candidates use this pattern? Because the media hates it! Candidates (except Trump) fear being labelled as shallow by the media, even though using the pattern helps them with voters. You can see this at work in the way the opinion pieces decide who ‘won’ the debates.

The Democratic debates show candidates using the opposite strategy: lots of detailed policy, lots of negativity (what’s wrong that I will fix), and lots of putting each other down.

Now it’s possible that the strategy needed to win a primary is different to that which wins a general election. But if you want to assess the chances of those who might make it through, then this pattern will help to see what their chances are against Trump in 2020.

Incumbency effects in U.S. presidential campaigns: Language patterns
matter, Electoral Studies, Vol 43, 95-103.
https://www.sciencedirect.com/science/article/pii/S0261379416302062

Detecting abusive language online

My student, Hannah Leblanc, has just defended her thesis looking at predicting abusive language. The document is

https://qspace.library.queensu.ca/handle/1974/26252

Rather than treat this as an empirical problem — gather all the signal you can, select attributes using training data, and then build a predictor using those attributes — she started with models of what might drive abusive language. In particular, abuse may be associated with subjectivity (objective language is less likely to be abusive, even if it contains individual words that might look abusive) and with otherness (abuse often results from one group targeting another). She also looked at emotion and mood signals and their association with abuse.

All of the models perform almost perfectly at detecting non-abuse; they struggle more with detecting abuse. Some of this comes from mislabelling — documents that are marked as abusive but really aren’t; but much of the rest comes from missing signal — abusive words disguised so that they don’t match the words of a lexicon.

Overall the model achieves accuracy of 95% and F-score of 0.91.

Is “sentiment analysis” doing anything real?

Oceans of computational cycles have been spent analysing the sentiment of documents, driven by businesses interested in how their products are being perceived, movie producers interested in their potential products, and just about everyone about tweets.

Sentiment is based on a measure of how “positive” or “negative” a particular document is. The problem is that there are a number of aspects of an individual that could be positive or negative, and sentiment analysis jams them all into one bucket and measures them. It’s far from clear that this measures anything real — signs of which can be seen in the well-known one-and-a-half star difference when individuals are asked to rate the same objects on two successive days.

So what can be positive and negative?

It could be the individual’s attitude to a particular object and, of course, this is what most systems purport to be measuring. However, attitude is a two-place relation: A’s attitude to B. It’s usually obvious that a document has been written by A, but much more difficult to make sure that the object about which the attitude is being expressed is actually B.

However, most of the difficulty comes from other aspects that can also be positive and negative. One of these is mood. Mood is an internal setting whose drivers are poorly understood but which is known to be (a) predictable over the course of a period of, say, a day, and (b) composed of two independent components, positive mood and negative mood (that is, not opposites). In broad brush terms, negative mood is stable through the day, while positive mood peaks in the middle of the day. There are longer term patterns as well; positive mood tends to increase through the week while negative mood decreases.

Looking at someone’s writing about an object therefore should take into account their underlying mood — but never does. And it would be difficult to tease apart the signals of mood from the signals of attitude with the current state of the art. But we could plausibly predict that “sentiment” would be less positive overall if it was captured at the beginning or end of the day.

The other aspect that can be positive or negative is emotion. Emotions are short-term responses to the current environment that play a role in reordering each individual’s priorities to optimize decision making, especially in response to an external stimulus.  There are two emotions that align strongly with positivity (joy) and negativity (disgust).

Looking at someone’s writing about an object should therefore take into account their emotional state (at the time they were writing) — but never does. Again it would be difficult to tease the signals of emotion and the signals of attitude apart. I have no doubt that many businesses get much worse results from their surveys than they ‘should’ because those surveys are designed so poorly that they become annoying, and this spills over into the content of the responses.

Bottom line: there is no such thing as positive sentiment or negative sentiment. There are positive or negative attitudes, moods, and emotions, but the one that sentiment analysis is trying to measure — attitudes — is inextricably confounded by the other two.  Progress is being made in understanding and detecting moods and emotions, but much less has been done on detecting attitudes, mostly because of the difficulty of finding the intended object within a short piece of text.

 

Advances in Social Network Analysis and Mining Conference — Sydney

This conference will be in Sydney in 2017, from 31st July to 3rd August.

http://asonam.cpsc.ucalgary.ca/2017/

As well as the main conference, there is also a workshop, FOSINT: Foundations of Open Source Intelligence, which may be of even more direct interest for readers of this blog.

Also I will be giving a tutorial on Adversarial Analytics as part of the conference.

6.5/7 US presidential elections predicted from language use

I couldn’t do a formal analysis of Trump/Clinton language because Trump didn’t put his speeches online — indeed many of them weren’t scripted. But, as I posted recently, his language was clearly closer to our model of how to win elections than Clinton’s was.

So since 1992, the language model has correctly predicted the outcome, except for 2000 when the model predicted a very slight advantage for Gore over Bush (which is sort of what happened).

People judge candidates on who they seem to be as a person, a large part of which is transmitted by the language they use. Negative and demeaning statements obviously affect this, but so does positivity and optimism.

Voting is not rational choice

Pundits and the media continue to be puzzled by the popularity of Donald Trump. They point out that much of what he says isn’t true, that his plans lack content, that his comments about various subgroups are demeaning, and so on, and so on.

Underlying these plaintive comments is a fundamental misconception about how voters choose the candidate they will vote for. This has much more to do with standard human, in the first few seconds, judgements of character and personality than it does about calm, reasoned decision making.

Our analysis of previous presidential campaigns (about which I’ve posted earlier) makes it clear that this campaign is not fundamentally different in this respect. It’s always been the case that voters decide based on the person who appeals to them most on a deeper than rational level. As we discovered, the successful formula for winning is to be positive (Trump is good at this), not to be negative (Trump is poor at this), not to talk about policy (Trump is good at this), and not to talk about the opponent (Trump is poor at this). On the other hand, Hillary Clinton is poor at all four — she really, really believes in the rational voter.

We’ll see what happens in the election this week. But apart from the unusual facts of this presidential election, it’s easy to understand why Trump isn’t doing worse and Hillary Clinton isn’t doing better from the way they approach voters.

It’s not classified emails that are the problem

There’s been reporting that the email trove, belonging to Huma Abedin but found on the laptop of her ex-husband, got there as the result of automatic backups from her phone. This seems plausible; if it is true then it raises issues that go beyond whether any of the emails contain classified information or not.

First, it shows how difficult it is for ordinary people to understand, and realise, the consequences of their choices about configuring their life-containing devices. Backing up emails is good, but every user needs to understand what that means, and how potentially invasive it is.

Second, to work as a backup site, this laptop must have been Internet-facing and (apparently) unencrypted. That means that more than half a million email messages were readily accessible to any reasonably adept cybercriminal or nation-state. If there are indeed classified emails among them, then that’s a big problem.

But even if there are not, access to someone’s emails, given the existence of textual analytics tools, means that a rich picture can be built up of that individual: what they are thinking about, who they are communicating with (their ego network in the jargon), what the rhythm of their day is, where they are located physically, what their emotional state is like, and even how healthy they are.

For any of us, that kind of analysis would be quite invasive. But when the individual is a close confidante of the U.S. Secretary of State, and when many of the emails are from that same Secretary, the benefit of a picture of them at this level of detail is valuable, and could be exploited by an adversary.

Lawyers and the media gravitate to the classified information issue. This is a 20th Century view of the problems that revealing large amounts of personal text cause. The real issue is an order of magnitude more subtle, but also an order of magnitude more dangerous.