What can be learned from text III

Another property that can be learned from text is the author’s attitude to whatever the text is about. This is called, variously, sentiment analysis or appraisal theory. For obvious reasons, it has always been interesting to advertisers and marketers.

In its simplest form, it just analyzes text for associations of adjectives with the nouns of interest, for example films or people. This could be as simple as seeing whether the adjective “good” or “bad” appears near the noun(s) in question. It is not too difficult to extend this to other sets of adjectives that can be considered positive or negative: “the movie was exciting” (good), or “the movie was boring” (bad).

However, this process is not quite as easy as it looks. First of all, it’s hard in languages like English to be sure which adjective goes with which noun — proximity in the sentence is often used, but this is not very robust: “Although parts of the movie were good, overall it was bad” is not a positive comment about the movie.

Second, authors often use devices such as irony and sarcasm which look, syntactically, as if they are giving one opinion, but are actually giving the opposite opinion. Humans figure this out using deep background knowledge about the situation and about human mental life, so it’s difficult for an algorithm to mimic this level of understanding.

Third, texts often comment about the parts of an object as well as the whole object, and it becomes difficult to decide which adjectives go with which parts.

There are three levels of algorithmic analysis used for this problem:

  1. Using simple sets of opinion adjectives (and maybe other words) and trying to associate them to the nouns of interest using proximity, perhaps with a little extra sophistication, trying to pick out dependent clauses etc.
  2. Parsing the text more deeply and using natural language analysis techniques to associate opinion words with the nouns of interest.
  3. Using systemic functional linguistics approaches, which treat language generation as a goal-driven task by an individual in a societal setting, as well as a technology.

These levels are arranged in increasing order of sophistication, and also of complexity. However, even the best algorithms perform only at the 80% or so level, and that’s only capturing relatively unsophisticated judgements.

There are obvious applications to sentiment analysis in adversarial situations: trying to decide whether a terrorist group pronouncement or a threat represents a genuine opinion by the author or some form of propaganda; and who the propaganda might be aimed at.

Advertisements

0 Responses to “What can be learned from text III”



  1. Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s





%d bloggers like this: