Posts Tagged 'Twitter'

Businesses processing emails

The Daily Mail reports an experiment by the High-Tech Bridge company in which they sent private emails or uploaded documents containing unique urls to 50 different platforms, and then waited to see if and who visited these urls.

Sure enough, several of them were visited by the businesses that had handled the matching document, including Facebook, Twitter, and Google. This won’t come as a surprise to readers of this blog, but once again points out the extent to which businesses like these are processing any documents they see to extract models of the sender/receiver.

There has been some confusion in the media about how this process might work. Evidently it’s not obvious to many that such a process is automated — there isn’t anyone ‘reading’ these documents, but they’re being processed by software which is capable of ingesting pages pointed to, and processing the contents of those pages as well. It would help if we agreed to verbs that distinguished ‘read by a human’ from ‘processed by software’ that were simple enough for the wider public to understand the difference.

Compelling evidence on Benghazi timeline

Kathleen Carley presented work on the social media data flow before, during, and after the Benghazi embassy attack in September 2012. She happened to be teaching a course on analysis of social media (tweets and mainstream media) over the early part of September and was able to quickly repurpose it.

Her results show that, in Libya, there was no social media discussion of the embassy attacks until several hours after they happened. Discussion of the infamous movie also only begins well after the attacks and then only as a result of speculation about whether it played any role.

In contrast, Egyptian social media feeds were abuzz with demonstration rhetoric well before the activity in Cairo.

This seems to provide a compelling argument against any “spontaneous demonstration” scenario to explain what happened in Benghazi (if anyone still thinks that). It’s also a nice demonstration of the coming of age of real-time social media analysis, although it also shows that getting real-time analysis requires having a team in place before hand.

The reference is: Near Real Time Assessment of Social Media Using Geo-Temporal Network Analytics, Kathleen M. Carley,  Juergen Pfeffera, Huan Liu, Fred Morstatter, Rebecca Goolsby, Proceedings of Advances in Social Network Analysis and Modelling (ASONAM) 2013, ACM & IEEE, 517-524.

Notes from ASONAM 2013

I’m at Asonam in Niagara Falls. I have noticed a few macro changes from the same conference last year in Istanbul:

  1. There is almost no interest in any form of clustering or community detection. I think that this is the result not of solving the problem but realising that it isn’t a well-formed problem for social networks (regular readers will be aware of my thoughts about this);
  2. There is growing awareness that preferential attachment does not generate networks that look very realistic, except when you look at them from a long way off with your eyes half closed (and some hope that models like forest fire might be close to usable);
  3. There has been a significant amount of progress in understanding the language use of tweets despite the obvious issues of dialect/patois, short length, mistyping and, often, lack of mental engagement when writing. I thought there was very little hope for this, so I’m delighted to be proved wrong. There are starting to be useful results learned from tweet corpora.

The mixture of attendees is less diverse than last year in Istanbul, not just geographically but by “home discipline”, which is  a pity.


Problems at the heart of social network analysis

About a month ago, I was at the conference on Advances in Social Network Analysis and Modelling (ASONAM), a first for me. It’s a wide ranging conference with, for example, both sociologists and computer scientists presenting. Of course, “social networks” in this context means so-called online social networks.

I was surprised by the kind of presentations, and I came away thinking that there are two big problems at the heart of social network analysis that are unsolved and that are hardly being investigated, or even thought about:

  1. We can’t generate social networks artificially that look much like the real thing. Clearly, there’s been some progress here: preferential attachment gets much closer to the real thing than random graph models did; assortativity was another big step closer to reality; but it seems as if there must be at least one more big subtle process that we don’t understand about how real-world social networks get built. In other words, when humans form pairwise connections, there’s some aspect of that that we still don’t understand.
  2. There seems to be a universal assumption that an individual reveals, in his/her online social activity, a kind of homunculus of their total social activity — in other words, online behavior is a subset of real-world behavior. I only have to say this explicitly to show how fragile (maybe even foolhardy) such an assumption is. What people post on Facebook or tweet is a side-channel of their full behavioral spectrum — and an extremely odd side-channel as well. I haven’t seen any work in behavioral modelling that tries to understand how such a side-channel works and what it captures. But building models that make the subset assumption seems to me to be building castles on clouds.

There were some other, smaller assumptions that seemed to me to be accepted with too little thought as well. First, I don’t think that centrality measures have much to tell us in all but the smallest networks, because centrality implicitly assumes that a network has a single centre — but most networks have multiple centres whether or not they contain multiple communities (clusters); and anyway they mostly do.

Second, I think that the creation of a link is qualitatively a different kind of action to any subsequent use of that link — and so it is important to model them differently. This is quite tricky to do, it seems to me, but might repay the effort. Sometimes, the creation is a weak signal and its only the use of the link that makes it a noteworthy connection; friending someone is pointless unless there’s some actual interaction. On the other hand, marrying someone is a strong status-changing signal regardless of whether anything happens subsequently (for example, if nothing happens subsequently, immigration enforcement authorities may take an interest).

Blurring Identity

As I posted a few days ago, if we can’t avoid having data about ourselves, our actions, and our identities collected, then one plausible strategy is to create artificial data to supplement the real data. This makes it harder to find the real identity inside the larger blurred identity.
And now there’s a tool to help: Please Dont Stalk Me which allows twitter users to make it look as it their tweets come from anywhere in the world. Here’s the web site:


What is social media?

I was at a meeting last week whose focus was on social media. It quickly became clear that there were two kinds of interests. One group wanted to build high-level systems that would revolutionize business and government (somehow) leveraging social media; another group were building or wanted to build tools that would provide some kind of meta-view of social media content and activity.

The topic that was missing from all of the discussion was what social media was, and why it is the way it is; and so I came away feeling like the entire discussion, and quite a lot of work, was dancing on clouds. There seem to be a number of things that “everybody knows” about social media, but for which there seems to be little or no evidence. The Arab Spring was driven by social media! Well, maybe, but (a) was it and how much, (b) which parts were important and which were irrelevant?

It seems helpful to divide social media into three categories:

1.  Media that is essentially public access publishing or public access (micro)blogging. Although sites that provide this kind of functionality are often considered “social” there is almost nothing social about them — yes, the audience for posts can be restricted to a particular group, but that’s always been true of any publication. There is an interesting question lurking here though: what are the reasons why individuals read such posts? What kind of bond does it imply between the reader and the author? (Cynically, why would I care what even my closest friend had for breakfast?)

2. Media that start as public access publishing, but where the conversation built on an initial post is more important or interesting than the initial post itself — in other words, there’s something emergent in the conversation that transcends what any of the participants would have said ab initio. This is a kind of social knowledge or opinion construction, and there are lots of interesting questions about who participates, what their roles are, and how the content and tone are affected by the interactions. This is, of course, not a new phenomenon but what’s new is the scope and the detail of what’s recorded, allowing answers to be worked out in ways that were impractical or too expensive before.

3. Media in which explicit relational links are created between one person and another. This is the real heart of social media. Relational links between a pair of people have, of course, always existed, but they could only be constructed in a small number of ways and were (almost always) limited by geography.

The emergent structure of these links is a really interesting artifact that deserves study and from which we will probably learn a lot about what it means to be human in a global society. What does it mean when one person “friends” another? This is one question for which simple answers tend to be assumed, but even a brief consideration of A’s Facebook friends and the rest of A’s relationships in the real world quickly shows that there’s a complex connection between the two sets (and it depends heavily on characteristics of A).

One thing that quickly becomes clear when these questions are addressed computationally is that we aren’t going to get far until relationship links are typed. It’s fairly easy to look at each relationship and give it a numerical weight that reflects (say) closeness — but it’s still true that different kinds of relationships behave differently, and need to be modelled differently to understand them. (Social media sites should also implement this typing — not every piece of data should flow down every link of A’s social network.)

The fundamental question in a world where one person can create a visible relationship, is what does this mean — for the person creating it, for the person at the other end of the relationship, and for the emergent graph structure that a collection of these individual relationships creates. Good, solid answers to this question would be a foundation on which much more useful applications could be built.