Posts Tagged 'social network analysis'

The power is in the edges

I’ve argued that it isn’t social media unless there are relational edges between individuals or individual objects. These edges are the drivers of power because the graph structure that emerges from them reveals a lot more than the individual nodes and edges do.

The number of LinkedIn contacts I have is now large enough that I can tell this story. I know someone from one of the more secretive US government organizations. His (or it might be her) public web presence, of course, has nothing at all to do with his day job, and we’ve never exchanged emails using his public email. Yet LinkedIn suggests him as someone I might possibly know.

The reason must be that we have enough mutual connections that the software LinkedIn uses sees that there “should” be some connection between us — it is doing edge prediction. This is exactly the kind of analysis that good intelligence tools can do on relational/graph data. The knowledge is in the links, collectively; in other words, noticing the potential link between us requires knowing both the presence of some links and the absence of others (because the system doesn’t recommend other people whose web presence is as dissimilar from mine as his is).

So, well done LinkedIn, but a cautionary tale for security folks generally, and especially those who believe in anonymization — it can’t be done!

What is social media?

I was at a meeting last week whose focus was on social media. It quickly became clear that there were two kinds of interests. One group wanted to build high-level systems that would revolutionize business and government (somehow) leveraging social media; another group were building or wanted to build tools that would provide some kind of meta-view of social media content and activity.

The topic that was missing from all of the discussion was what social media was, and why it is the way it is; and so I came away feeling like the entire discussion, and quite a lot of work, was dancing on clouds. There seem to be a number of things that “everybody knows” about social media, but for which there seems to be little or no evidence. The Arab Spring was driven by social media! Well, maybe, but (a) was it and how much, (b) which parts were important and which were irrelevant?

It seems helpful to divide social media into three categories:

1.  Media that is essentially public access publishing or public access (micro)blogging. Although sites that provide this kind of functionality are often considered “social” there is almost nothing social about them — yes, the audience for posts can be restricted to a particular group, but that’s always been true of any publication. There is an interesting question lurking here though: what are the reasons why individuals read such posts? What kind of bond does it imply between the reader and the author? (Cynically, why would I care what even my closest friend had for breakfast?)

2. Media that start as public access publishing, but where the conversation built on an initial post is more important or interesting than the initial post itself — in other words, there’s something emergent in the conversation that transcends what any of the participants would have said ab initio. This is a kind of social knowledge or opinion construction, and there are lots of interesting questions about who participates, what their roles are, and how the content and tone are affected by the interactions. This is, of course, not a new phenomenon but what’s new is the scope and the detail of what’s recorded, allowing answers to be worked out in ways that were impractical or too expensive before.

3. Media in which explicit relational links are created between one person and another. This is the real heart of social media. Relational links between a pair of people have, of course, always existed, but they could only be constructed in a small number of ways and were (almost always) limited by geography.

The emergent structure of these links is a really interesting artifact that deserves study and from which we will probably learn a lot about what it means to be human in a global society. What does it mean when one person “friends” another? This is one question for which simple answers tend to be assumed, but even a brief consideration of A’s Facebook friends and the rest of A’s relationships in the real world quickly shows that there’s a complex connection between the two sets (and it depends heavily on characteristics of A).

One thing that quickly becomes clear when these questions are addressed computationally is that we aren’t going to get far until relationship links are typed. It’s fairly easy to look at each relationship and give it a numerical weight that reflects (say) closeness — but it’s still true that different kinds of relationships behave differently, and need to be modelled differently to understand them. (Social media sites should also implement this typing — not every piece of data should flow down every link of A’s social network.)

The fundamental question in a world where one person can create a visible relationship, is what does this mean — for the person creating it, for the person at the other end of the relationship, and for the emergent graph structure that a collection of these individual relationships creates. Good, solid answers to this question would be a foundation on which much more useful applications could be built.

Structure of social network graphs

Many researchers study social network graphs to try and understand how we as humans interact, especially in information systems and online. However, it has always been difficult to validate results because privacy concerns usually limit access to real datasets. Many results have been validated using artificial graphs, generated in a way that mimics the large-scale properties of real graphs. For example, the artificial graphs look like real graphs in the sense that they obey power laws, have the right kind of degree sequences and so on.

Often, preferential attachment is used as the construction technique. In this algorithm, an edge is attached to a vertex with a probability in proportion to the number of edges already attached to it.  This seems intuitively plausible in many human settings: a person with many friends tends to meet more people and so has a greater chance of making more friends.

There have been hints for a while that these artificial graphs were not quite like real graphs, even though they match according to many large-scale measures. For example, Newman showed that, in human graphs, high-degree nodes tended to be connected to high-degree nodes, while in technical networks this was not the case — even though both looked the same from a power-law perspective.

But now Faloutsos’s group at CMU have shown convincingly that there are substantial differences between artificial and real graphs. They looked at what happens to the diameter of a graph as edges are uniformly randomly deleted. As edges are deleted, the diameter grows slowly but, at some point, there’s a sharp increase. They call this the shatter point.

The important thing is that the shatter point of artificial graphs is substantially higher (i.e. the fraction of edges remaining when this happens) than for real graphs. In other words, graphs generated by humans rather than simply by preferential attachment are somehow tougher. Although humans must be choosing edges to connect based on local criteria, they must somehow do this in a way that makes the global srtucture of the graph more robust. It’s not at all clear (to me at least) how this happens, but it seems plausible.

One of the implications of this difference is that it calls into question much of the conventional wisdom about social networks, whenever this has been derived from, or validated by, artificially generated datasets. Which is quite a lot of the time.

Call for Papers: Link Analysis, Counterterrorism and Security

The Call for the LACTS 2009 workshop is now available here.

The workshop takes place at the SIAM Data Mining Conference and brings together academics, practitioners, law enforcement, and intelligence people to talk about leading-edge work in the area of adversarial data analysis.

The workshop is intended primarily for early-stage work. The proceedings are published electronically, but authors may retain copyright.

The deadline for submissions is probably late December, but perhaps a little later (still being decided).



Follow

Get every new post delivered to your Inbox.