Microblogging workshop at CHI2010

Yesterday’s microblogging workshop at CHI2010 was great, as those of you following #CHImb on twitter already know.  All of the participants brought interesting ideas–too many to list here.  So I’m just going to focus on a few themes/results that relate most closely to IR.  I highly recommend browsing the list of accepted papers to see for yourself the many, many interesting contributions.

First, I’ll mention that Gene Golovchinsky did a wonderful job presenting our paper on making sense of twitter search.  Gene has posted his slides and some discussion of the workshop.  The questions we posed in the paper and the presentation were:

  • What information needs do people actually bring to microblog search?
  • What should a test collection for conducting research on microblog search look like?

Instead of dwelling on our own contribution, though, I want to offer a recap of some of the work of other people…

I was especially interested in work by several researchers from Xerox PARC.

Michael Bernstein showed a system, eddi, that helps readers who follow many people manage their twitter experience, avoiding information overload via intelligent filtering on several levels.  Ed Chi introduced FeedWinnower, another ambitious system for managing twitter information.  I was especially interested in Bongwon Suh‘s talk.  He focused on the role that serendipity plays (or should play) in twitter search.  He suggested that search over microblog data (I know, microblog is not equal to twitter) benefits from serendipity.  Of course only certain types of serendipity are valuable in this context (he said something to the effect of courting previously unknown relevance).

Another really interesting paper (and an interesting conversation over lunch) came from Alice Oh.  The paper focused on using people’s list memberships to induce models of their interests and expertise. I think Alice’s paper speaks to the challenge of finding sources of evidence for information management in microblog environments.

With respect to IR and microblogging, I came away with from the workshop with new questions and with a keener edge on questions I already had. Here’s a very abbreviated list of some challenges that researchers in this area face.

information needs: What types of information needs are most germane in this space?  Are users interested in known-item search, ad hoc retrieval, recommendations, browsing, something completely new?

unit of retrieval: Of course this goes back to the matter of information needs (as do all of the following points).  Certainly the task at hand will sway exactly what it is that systems should show users.  But my sense is that some sort of entity search is almost always likely to be of more value than treating an individual tweet as a ‘document.’  i.e. Search over people, conversations, communities, hashtags, etc. will, I think, lend more value than tweets taken out of context.

data acquisition and evaluation: It’s easy to get lots of twitter data; just latch onto the garden hose and go.  In some cases, data from the hose may be perfectly useful for research and development. Do we need or want formal test collections of this type of data?  If so, what should they look like?  How does obsolescence figure into creating a test collection of de facto ephemeral data?  And of course, there’s probably more to ground truth the mechanical Turk.

objective functions: In the arena of microblog search, what criteria should we use to rank (if we ARE ranking) entities?  Certainly twitter’s own search engine sees temporality as paramount.  As always, relevance is dicey here–a murky mixture of topicality, usefulness, trustworthiness, timeliness, etc.

By way of a parting shot, I’d like to thank Julia Grace, Denjin Zhao, and danah boyd for organizing the workshop.


