Posts Tagged 'Facebook'

Businesses processing emails

The Daily Mail reports an experiment by the High-Tech Bridge company in which they sent private emails or uploaded documents containing unique urls to 50 different platforms, and then waited to see if and who visited these urls.

Sure enough, several of them were visited by the businesses that had handled the matching document, including Facebook, Twitter, and Google. This won’t come as a surprise to readers of this blog, but once again points out the extent to which businesses like these are processing any documents they see to extract models of the sender/receiver.

There has been some confusion in the media about how this process might work. Evidently it’s not obvious to many that such a process is automated — there isn’t anyone ‘reading’ these documents, but they’re being processed by software which is capable of ingesting pages pointed to, and processing the contents of those pages as well. It would help if we agreed to verbs that distinguished ‘read by a human’ from ‘processed by software’ that were simple enough for the wider public to understand the difference.

Problems at the heart of social network analysis

About a month ago, I was at the conference on Advances in Social Network Analysis and Modelling (ASONAM), a first for me. It’s a wide ranging conference with, for example, both sociologists and computer scientists presenting. Of course, “social networks” in this context means so-called online social networks.

I was surprised by the kind of presentations, and I came away thinking that there are two big problems at the heart of social network analysis that are unsolved and that are hardly being investigated, or even thought about:

  1. We can’t generate social networks artificially that look much like the real thing. Clearly, there’s been some progress here: preferential attachment gets much closer to the real thing than random graph models did; assortativity was another big step closer to reality; but it seems as if there must be at least one more big subtle process that we don’t understand about how real-world social networks get built. In other words, when humans form pairwise connections, there’s some aspect of that that we still don’t understand.
  2. There seems to be a universal assumption that an individual reveals, in his/her online social activity, a kind of homunculus of their total social activity — in other words, online behavior is a subset of real-world behavior. I only have to say this explicitly to show how fragile (maybe even foolhardy) such an assumption is. What people post on Facebook or tweet is a side-channel of their full behavioral spectrum — and an extremely odd side-channel as well. I haven’t seen any work in behavioral modelling that tries to understand how such a side-channel works and what it captures. But building models that make the subset assumption seems to me to be building castles on clouds.

There were some other, smaller assumptions that seemed to me to be accepted with too little thought as well. First, I don’t think that centrality measures have much to tell us in all but the smallest networks, because centrality implicitly assumes that a network has a single centre — but most networks have multiple centres whether or not they contain multiple communities (clusters); and anyway they mostly do.

Second, I think that the creation of a link is qualitatively a different kind of action to any subsequent use of that link — and so it is important to model them differently. This is quite tricky to do, it seems to me, but might repay the effort. Sometimes, the creation is a weak signal and its only the use of the link that makes it a noteworthy connection; friending someone is pointless unless there’s some actual interaction. On the other hand, marrying someone is a strong status-changing signal regardless of whether anything happens subsequently (for example, if nothing happens subsequently, immigration enforcement authorities may take an interest).

What is social media?

I was at a meeting last week whose focus was on social media. It quickly became clear that there were two kinds of interests. One group wanted to build high-level systems that would revolutionize business and government (somehow) leveraging social media; another group were building or wanted to build tools that would provide some kind of meta-view of social media content and activity.

The topic that was missing from all of the discussion was what social media was, and why it is the way it is; and so I came away feeling like the entire discussion, and quite a lot of work, was dancing on clouds. There seem to be a number of things that “everybody knows” about social media, but for which there seems to be little or no evidence. The Arab Spring was driven by social media! Well, maybe, but (a) was it and how much, (b) which parts were important and which were irrelevant?

It seems helpful to divide social media into three categories:

1.  Media that is essentially public access publishing or public access (micro)blogging. Although sites that provide this kind of functionality are often considered “social” there is almost nothing social about them — yes, the audience for posts can be restricted to a particular group, but that’s always been true of any publication. There is an interesting question lurking here though: what are the reasons why individuals read such posts? What kind of bond does it imply between the reader and the author? (Cynically, why would I care what even my closest friend had for breakfast?)

2. Media that start as public access publishing, but where the conversation built on an initial post is more important or interesting than the initial post itself — in other words, there’s something emergent in the conversation that transcends what any of the participants would have said ab initio. This is a kind of social knowledge or opinion construction, and there are lots of interesting questions about who participates, what their roles are, and how the content and tone are affected by the interactions. This is, of course, not a new phenomenon but what’s new is the scope and the detail of what’s recorded, allowing answers to be worked out in ways that were impractical or too expensive before.

3. Media in which explicit relational links are created between one person and another. This is the real heart of social media. Relational links between a pair of people have, of course, always existed, but they could only be constructed in a small number of ways and were (almost always) limited by geography.

The emergent structure of these links is a really interesting artifact that deserves study and from which we will probably learn a lot about what it means to be human in a global society. What does it mean when one person “friends” another? This is one question for which simple answers tend to be assumed, but even a brief consideration of A’s Facebook friends and the rest of A’s relationships in the real world quickly shows that there’s a complex connection between the two sets (and it depends heavily on characteristics of A).

One thing that quickly becomes clear when these questions are addressed computationally is that we aren’t going to get far until relationship links are typed. It’s fairly easy to look at each relationship and give it a numerical weight that reflects (say) closeness — but it’s still true that different kinds of relationships behave differently, and need to be modelled differently to understand them. (Social media sites should also implement this typing — not every piece of data should flow down every link of A’s social network.)

The fundamental question in a world where one person can create a visible relationship, is what does this mean — for the person creating it, for the person at the other end of the relationship, and for the emergent graph structure that a collection of these individual relationships creates. Good, solid answers to this question would be a foundation on which much more useful applications could be built.

Privacy and social media

I was at a meeting last week at which one the speakers said this (roughly paraphrased):  15 years ago, the amount of data visible on a typical Facebook user’s profile page would have required a warrant to collect (and the warrant would have been difficult to get). 100 years ago this amount of data probably couldn’t have been collected, at least not at reasonable cost.

I think he’s probably right. Empirical data, rather than academic theorizing, has consistently shown that people are willing to go public with an amazing amount of data about themselves. This decision may be pragmatic: being visible brings benefits that outweigh the risks; it may be ignorance of what those risks are; it may be the inability to understand, in a visceral way, just how public something posted on the internet is and how long it will last. As far as I know, there’s been little concrete research on this issue.

This massive release of personal data is changing the discussion of what privacy is and what its role in society should be. This is especially true in places like the U.S. where the relevant law is expressed in terms of what the social expectation of privacy is — so that the boundary between public and private moves “automatically” as society changes.

But it’s worth reminding ourselves that little more than a 100 years ago, nobody had any privacy in the sense that everyone in your village or town knew everything about you, including everything about your whole life history and that of your parents and grandparents and so on. Until about 100 years ago, almost nobody was ever alone, either inside or outside. The whole idea of privacy is an invention of urbanisation where, for the first time in history, someone other than a hermit could act anonymously. It’s also an invention of secularization since, in most religious traditions, God is conceived of as omniscient so that no human could act anonymously or invisibly in a deep sense.