Posts Tagged 'corpus analytics'

Empirical Assessment of Al Qaeda, Isis, and Taliban Propaganda

I’ve just been working on assessing the potential impact of the three major magazines: Inspire (AQAP), Azan (Taliban), and Dabiq (ISIS), competing for the market in lone wolf jihadists in the West.

I compare these magazines using models for the intensity of informative, imaginative, deceptive, jihadist, and gamification language, and build an empirical model for propaganda which combines these into a single scale.

Unsurprisingly, Dabiq ranks highest in propaganda intensity.

The details can be found in the full draft paper, posted to SSRN:

Skillicorn, David, Empirical Assessment of Al Qaeda, Isis, and Taliban Propaganda (January 7, 2015). Available at SSRN: http://ssrn.com/abstract=2546478.

Advertisements

Benford’s Law in action

Benford’s Law is about the distribution of initial digits in numbers from the real world. It plays a role in detecting, for example, financial fraud in tax returns because made-up numbers are quite easily distinguishable from actual ones.

There have been several attempts to explain Benford’s Law based on the processes that give rise to actual numbers. I’ve been analysing U.S. State of the Union speeches over the past 200+ years. The patterns agree with what Benford’s Law would predict, but it’s much less clear that the putative explanations make sense, given the time scale and varying authorship of these documents.

Here’s the list of the top 100 in decreasing order of frequency of occurrence, with increasing indents by magnitude:

     
          000
one
two
1
2
3
three
     30
          million
5
4
          billion
     10
6
four
     20
          500
     12
five
7
     15
          hundred
9
     50
     25
          100
8
     11
     14
ten
six
     40
     13
     16
     18
     00
     22
          thousand
seven
     17
               1947
          300
          200
     24
eight
     60
               1890
     27
     21
     35
     28
     26
     23
     19
     90
           400
     80
                1893
     70
                1946
                1945
     75
                1899
          600
     31
     twenty
     33
          700
     45
                1860
                1891
                1898
     65
          250
          150
                1892
     fifty
                1900
     twelve
     41
     37
                1894
nine
                1897
                1846
                1861
     36
                1878
     55
                1911
                1889
     thirty
                1909
     29
     32
                1885
                1858
     54
     34
     63

Within each range, the larger the number the lower its frequency. But there are some interesting exceptions: numbers that are multiples of 5 or 10 tend to appear ‘earlier’ than they should. The references to years almost all cluster around the end of the 19th Century and the beginning of the 20th.


Top Posts

Advertisements