Finding things you don’t already know

One of the big problems with the tools available on the web is that they are all convergent — in other words, they help you find out more about things you already know something about. Search engines are the most obvious example. Almost nobody looks beyond the first one or two results returned, so there’s almost no opportunity to be surprised by something that was returned in response to a query. Blogs and tags look, at first glance, as if they’re a little better; but after a while you come to realize that an awful lot of content is recycled among different blogs, and genuinely coming across something new and unexpected is a rarity.

We started working, some years ago, on a divergent information system called Athens. It’s designed to show you things that you don’t already know about, but which you are well-positioned to understand. So you get to find out new things, but not just random new things — things that should make some sense to you.

In a nutshell, the way it works is as follows:

  • You provide some “search terms” that indicate an area that you already know about.
  • To reduce the dependence of the system on your particular choices of initial search terms, Athens uses them as search terms, fetches a large set of pages, extracts their content, and treats this as a definition of the “area you already know about”.
  • Athens then searches out one level from this initial area using a combination of your original search terms, and new terms discovered from understanding the initial area. A large set of pages are retrieved and clustered. The content of these new pages is assumed to be things you already know about, because they are so closely related to the initial subject area.
  • Athens then repeats the process using combinations of search terms from the new clusters, and from the topics of the initial area. Again, a large set of pages are retrieved and clustered.
  • These clusters are presented as the results of the search. They are two levels away from the initial search, and so are likely to be both novel and related to the initial area of knowledge.

This works well, but there are some intricate problems along the way.

The first is that people actually have trouble getting their minds around the idea of novel knowledge. They keep being sucked into expectations derived from search engines such as Google and Yahoo — they expect, unconsciously, to see things that look familiar. And when they don’t, they want to see and be reassured about the connections between the new ideas and the old.

The second problem is that it’s hard to present the content of a cluster of novel information in a way that makes sense, given that the user doesn’t know yet what it is about. This goes to show how much we fill in the results of ordinary search with background and contextual information without realizing that we are doing it.

The other thing that surprises people is how long the process takes. A search engine query takes less than a second. An Athens query can take 8-10 hours!

Athens is useful for novel knowledge discovery in a number of different ways, which I’ll talk about in the next post.


0 Responses to “Finding things you don’t already know”

  1. Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: