Posts Tagged 'sentiment analysis'

Is “sentiment analysis” doing anything real?

Oceans of computational cycles have been spent analysing the sentiment of documents, driven by businesses interested in how their products are being perceived, movie producers interested in their potential products, and just about everyone about tweets.

Sentiment is based on a measure of how “positive” or “negative” a particular document is. The problem is that there are a number of aspects of an individual that could be positive or negative, and sentiment analysis jams them all into one bucket and measures them. It’s far from clear that this measures anything real — signs of which can be seen in the well-known one-and-a-half star difference when individuals are asked to rate the same objects on two successive days.

So what can be positive and negative?

It could be the individual’s attitude to a particular object and, of course, this is what most systems purport to be measuring. However, attitude is a two-place relation: A’s attitude to B. It’s usually obvious that a document has been written by A, but much more difficult to make sure that the object about which the attitude is being expressed is actually B.

However, most of the difficulty comes from other aspects that can also be positive and negative. One of these is mood. Mood is an internal setting whose drivers are poorly understood but which is known to be (a) predictable over the course of a period of, say, a day, and (b) composed of two independent components, positive mood and negative mood (that is, not opposites). In broad brush terms, negative mood is stable through the day, while positive mood peaks in the middle of the day. There are longer term patterns as well; positive mood tends to increase through the week while negative mood decreases.

Looking at someone’s writing about an object therefore should take into account their underlying mood — but never does. And it would be difficult to tease apart the signals of mood from the signals of attitude with the current state of the art. But we could plausibly predict that “sentiment” would be less positive overall if it was captured at the beginning or end of the day.

The other aspect that can be positive or negative is emotion. Emotions are short-term responses to the current environment that play a role in reordering each individual’s priorities to optimize decision making, especially in response to an external stimulus.  There are two emotions that align strongly with positivity (joy) and negativity (disgust).

Looking at someone’s writing about an object should therefore take into account their emotional state (at the time they were writing) — but never does. Again it would be difficult to tease the signals of emotion and the signals of attitude apart. I have no doubt that many businesses get much worse results from their surveys than they ‘should’ because those surveys are designed so poorly that they become annoying, and this spills over into the content of the responses.

Bottom line: there is no such thing as positive sentiment or negative sentiment. There are positive or negative attitudes, moods, and emotions, but the one that sentiment analysis is trying to measure — attitudes — is inextricably confounded by the other two.  Progress is being made in understanding and detecting moods and emotions, but much less has been done on detecting attitudes, mostly because of the difficulty of finding the intended object within a short piece of text.


What can be learned from text III

Another property that can be learned from text is the author’s attitude to whatever the text is about. This is called, variously, sentiment analysis or appraisal theory. For obvious reasons, it has always been interesting to advertisers and marketers.

In its simplest form, it just analyzes text for associations of adjectives with the nouns of interest, for example films or people. This could be as simple as seeing whether the adjective “good” or “bad” appears near the noun(s) in question. It is not too difficult to extend this to other sets of adjectives that can be considered positive or negative: “the movie was exciting” (good), or “the movie was boring” (bad).

However, this process is not quite as easy as it looks. First of all, it’s hard in languages like English to be sure which adjective goes with which noun — proximity in the sentence is often used, but this is not very robust: “Although parts of the movie were good, overall it was bad” is not a positive comment about the movie.

Second, authors often use devices such as irony and sarcasm which look, syntactically, as if they are giving one opinion, but are actually giving the opposite opinion. Humans figure this out using deep background knowledge about the situation and about human mental life, so it’s difficult for an algorithm to mimic this level of understanding.

Third, texts often comment about the parts of an object as well as the whole object, and it becomes difficult to decide which adjectives go with which parts.

There are three levels of algorithmic analysis used for this problem:

  1. Using simple sets of opinion adjectives (and maybe other words) and trying to associate them to the nouns of interest using proximity, perhaps with a little extra sophistication, trying to pick out dependent clauses etc.
  2. Parsing the text more deeply and using natural language analysis techniques to associate opinion words with the nouns of interest.
  3. Using systemic functional linguistics approaches, which treat language generation as a goal-driven task by an individual in a societal setting, as well as a technology.

These levels are arranged in increasing order of sophistication, and also of complexity. However, even the best algorithms perform only at the 80% or so level, and that’s only capturing relatively unsophisticated judgements.

There are obvious applications to sentiment analysis in adversarial situations: trying to decide whether a terrorist group pronouncement or a threat represents a genuine opinion by the author or some form of propaganda; and who the propaganda might be aimed at.