Posts Tagged 'language'

Is “sentiment analysis” doing anything real?

Oceans of computational cycles have been spent analysing the sentiment of documents, driven by businesses interested in how their products are being perceived, movie producers interested in their potential products, and just about everyone about tweets.

Sentiment is based on a measure of how “positive” or “negative” a particular document is. The problem is that there are a number of aspects of an individual that could be positive or negative, and sentiment analysis jams them all into one bucket and measures them. It’s far from clear that this measures anything real — signs of which can be seen in the well-known one-and-a-half star difference when individuals are asked to rate the same objects on two successive days.

So what can be positive and negative?

It could be the individual’s attitude to a particular object and, of course, this is what most systems purport to be measuring. However, attitude is a two-place relation: A’s attitude to B. It’s usually obvious that a document has been written by A, but much more difficult to make sure that the object about which the attitude is being expressed is actually B.

However, most of the difficulty comes from other aspects that can also be positive and negative. One of these is mood. Mood is an internal setting whose drivers are poorly understood but which is known to be (a) predictable over the course of a period of, say, a day, and (b) composed of two independent components, positive mood and negative mood (that is, not opposites). In broad brush terms, negative mood is stable through the day, while positive mood peaks in the middle of the day. There are longer term patterns as well; positive mood tends to increase through the week while negative mood decreases.

Looking at someone’s writing about an object therefore should take into account their underlying mood — but never does. And it would be difficult to tease apart the signals of mood from the signals of attitude with the current state of the art. But we could plausibly predict that “sentiment” would be less positive overall if it was captured at the beginning or end of the day.

The other aspect that can be positive or negative is emotion. Emotions are short-term responses to the current environment that play a role in reordering each individual’s priorities to optimize decision making, especially in response to an external stimulus.  There are two emotions that align strongly with positivity (joy) and negativity (disgust).

Looking at someone’s writing about an object should therefore take into account their emotional state (at the time they were writing) — but never does. Again it would be difficult to tease the signals of emotion and the signals of attitude apart. I have no doubt that many businesses get much worse results from their surveys than they ‘should’ because those surveys are designed so poorly that they become annoying, and this spills over into the content of the responses.

Bottom line: there is no such thing as positive sentiment or negative sentiment. There are positive or negative attitudes, moods, and emotions, but the one that sentiment analysis is trying to measure — attitudes — is inextricably confounded by the other two.  Progress is being made in understanding and detecting moods and emotions, but much less has been done on detecting attitudes, mostly because of the difficulty of finding the intended object within a short piece of text.



Advances in Social Network Analysis and Mining Conference — Sydney

This conference will be in Sydney in 2017, from 31st July to 3rd August.

As well as the main conference, there is also a workshop, FOSINT: Foundations of Open Source Intelligence, which may be of even more direct interest for readers of this blog.

Also I will be giving a tutorial on Adversarial Analytics as part of the conference.

6.5/7 US presidential elections predicted from language use

I couldn’t do a formal analysis of Trump/Clinton language because Trump didn’t put his speeches online — indeed many of them weren’t scripted. But, as I posted recently, his language was clearly closer to our model of how to win elections than Clinton’s was.

So since 1992, the language model has correctly predicted the outcome, except for 2000 when the model predicted a very slight advantage for Gore over Bush (which is sort of what happened).

People judge candidates on who they seem to be as a person, a large part of which is transmitted by the language they use. Negative and demeaning statements obviously affect this, but so does positivity and optimism.

Voting is not rational choice

Pundits and the media continue to be puzzled by the popularity of Donald Trump. They point out that much of what he says isn’t true, that his plans lack content, that his comments about various subgroups are demeaning, and so on, and so on.

Underlying these plaintive comments is a fundamental misconception about how voters choose the candidate they will vote for. This has much more to do with standard human, in the first few seconds, judgements of character and personality than it does about calm, reasoned decision making.

Our analysis of previous presidential campaigns (about which I’ve posted earlier) makes it clear that this campaign is not fundamentally different in this respect. It’s always been the case that voters decide based on the person who appeals to them most on a deeper than rational level. As we discovered, the successful formula for winning is to be positive (Trump is good at this), not to be negative (Trump is poor at this), not to talk about policy (Trump is good at this), and not to talk about the opponent (Trump is poor at this). On the other hand, Hillary Clinton is poor at all four — she really, really believes in the rational voter.

We’ll see what happens in the election this week. But apart from the unusual facts of this presidential election, it’s easy to understand why Trump isn’t doing worse and Hillary Clinton isn’t doing better from the way they approach voters.

It’s not classified emails that are the problem

There’s been reporting that the email trove, belonging to Huma Abedin but found on the laptop of her ex-husband, got there as the result of automatic backups from her phone. This seems plausible; if it is true then it raises issues that go beyond whether any of the emails contain classified information or not.

First, it shows how difficult it is for ordinary people to understand, and realise, the consequences of their choices about configuring their life-containing devices. Backing up emails is good, but every user needs to understand what that means, and how potentially invasive it is.

Second, to work as a backup site, this laptop must have been Internet-facing and (apparently) unencrypted. That means that more than half a million email messages were readily accessible to any reasonably adept cybercriminal or nation-state. If there are indeed classified emails among them, then that’s a big problem.

But even if there are not, access to someone’s emails, given the existence of textual analytics tools, means that a rich picture can be built up of that individual: what they are thinking about, who they are communicating with (their ego network in the jargon), what the rhythm of their day is, where they are located physically, what their emotional state is like, and even how healthy they are.

For any of us, that kind of analysis would be quite invasive. But when the individual is a close confidante of the U.S. Secretary of State, and when many of the emails are from that same Secretary, the benefit of a picture of them at this level of detail is valuable, and could be exploited by an adversary.

Lawyers and the media gravitate to the classified information issue. This is a 20th Century view of the problems that revealing large amounts of personal text cause. The real issue is an order of magnitude more subtle, but also an order of magnitude more dangerous.

“It’s going to be really great”

Donald Trump continues to be the poster child for our election-winning-language  model: high positive language, as little negative language as possible, and appeals to policy goals without getting into details. The media and pundits are tearing their hair out because he refuses to talk about specifics but, as we predict, it’s working! (Interestingly, I went back and looked at Perot’s language in the 1992 election, and he had more or less the same patterns — and he led the party contenders in national polls for a period in 1992.)

What the media and pundits don’t realise is that incumbent presidents running for a second term use language very similar to Trump’s. It’s just that, with a first-term track record, it’s not as glaringly obvious, and they don’t notice.

Trump’s continuing success

As I posted earlier, our study of previous successful presidential candidates shows that success is very strongly correlated with a particular language model, consisting of:

  • Uniformly positive language
  • Complete absence of negative language
  • Using uplifting, aspirational metaphors rather than policy proposals, and
  • Ignoring the competing candidates

Trump presumably polls well, to a large extent, because he uses this language model (not so much ignoring of the competing candidates recently, but maybe that’s the effect of a primary). This language pattern tends to be used by incumbent presidents running for re-election, and seems to derive from their self-perception as already-successful in the job they’re re-applying for. Trump, similarly, possesses huge self confidence that seems to have the same effect — he perceives himself as (automatically, guaranteed) successful as president.

The dynamic between the successful self-perception issue and the competence issue was hard to separate before; and we’ve used ‘statesmanlike’ to describe the model of language of electoral success. All of the presidential incumbents whom we previously studied had a self-perception of success and a demonstrated competence and we assumed that both were necessary to deploy the required language comfortably and competently. Trump, however, shows that this isn’t so — it’s possible to possess the self-perception of success without the previously demonstrated competence. In Trump’s case, presumably, it is derived from competence in a rather different job: building a financial empire.

The media is in a frenzy about the competence issue for Trump. But our language model explains how it is possible to be popular among voters without demonstrating much competence, or even planned competence, to solve the problems of the day.

Voters don’t care about objective competence in the way that the media do. They care about the underlying personal self-confidence that is revealed in each candidate’s language. The data is very clear about this.

It may even be the rational view that a voter should take. Presidents encounter, in office, many issues that they had not previously formulated a policy for, so self-confidence may be more valuable than prepackaged plans. And voters have learned that most policies do not get implemented in office anyway.

It’s silly to treat Trump as a front runner when no actual vote has yet been cast. But it wouldn’t be surprising if he continues to do well for some time.  Of the other candidates, only Christie shows any sense of the use of positive language but, as a veteran politician, he cannot seem to avoid the need to present policies.