Microblogging workshop at CHI2010

Yesterday’s microblogging workshop at CHI2010 was great, as those of you following #CHImb on twitter already know.  All of the participants brought interesting ideas–too many to list here.  So I’m just going to focus on a few themes/results that relate most closely to IR.  I highly recommend browsing the list of accepted papers to see for yourself the many, many interesting contributions.

First, I’ll mention that Gene Golovchinsky did a wonderful job presenting our paper on making sense of twitter search.  Gene has posted his slides and some discussion of the workshop.  The questions we posed in the paper and the presentation were:

  • What information needs do people actually bring to microblog search?
  • What should a test collection for conducting research on microblog search look like?

Instead of dwelling on our own contribution, though, I want to offer a recap of some of the work of other people…

I was especially interested in work by several researchers from Xerox PARC.

Michael Bernstein showed a system, eddi, that helps readers who follow many people manage their twitter experience, avoiding information overload via intelligent filtering on several levels.  Ed Chi introduced FeedWinnower, another ambitious system for managing twitter information.  I was especially interested in Bongwon Suh‘s talk.  He focused on the role that serendipity plays (or should play) in twitter search.  He suggested that search over microblog data (I know, microblog is not equal to twitter) benefits from serendipity.  Of course only certain types of serendipity are valuable in this context (he said something to the effect of courting previously unknown relevance).

Another really interesting paper (and an interesting conversation over lunch) came from Alice Oh.  The paper focused on using people’s list memberships to induce models of their interests and expertise. I think Alice’s paper speaks to the challenge of finding sources of evidence for information management in microblog environments.

With respect to IR and microblogging, I came away with from the workshop with new questions and with a keener edge on questions I already had. Here’s a very abbreviated list of some challenges that researchers in this area face.

information needs: What types of information needs are most germane in this space?  Are users interested in known-item search, ad hoc retrieval, recommendations, browsing, something completely new?

unit of retrieval: Of course this goes back to the matter of information needs (as do all of the following points).  Certainly the task at hand will sway exactly what it is that systems should show users.  But my sense is that some sort of entity search is almost always likely to be of more value than treating an individual tweet as a ‘document.’  i.e. Search over people, conversations, communities, hashtags, etc. will, I think, lend more value than tweets taken out of context.

data acquisition and evaluation: It’s easy to get lots of twitter data; just latch onto the garden hose and go.  In some cases, data from the hose may be perfectly useful for research and development. Do we need or want formal test collections of this type of data?  If so, what should they look like?  How does obsolescence figure into creating a test collection of de facto ephemeral data?  And of course, there’s probably more to ground truth the mechanical Turk.

objective functions: In the arena of microblog search, what criteria should we use to rank (if we ARE ranking) entities?  Certainly twitter’s own search engine sees temporality as paramount.  As always, relevance is dicey here–a murky mixture of topicality, usefulness, trustworthiness, timeliness, etc.

By way of a parting shot, I’d like to thank Julia Grace, Denjin Zhao, and danah boyd for organizing the workshop.

twitter at asist 2009

Recently I’ve been speaking with several folks (e.g. Megan Winget and Gene Golovshinsky) about how twitter is or might be important with respect to academic conferences.  I’ve got some research coming up where I hope to look at this.

But in the meantime, I put this together:


A screenshot:


ASIST-related words

Words appearing in recent tweets tagged with #asist, #asist09, #asist2009, #asist2010



People heading to ASIST 2009 might be interested in it.  The page just gives a snapshot (updated hourly) of words from tweets tagged with #asist, #asist09, #asist2009, and #asist2010.

A caveat: having slapped this together quickly, I’m not sure how the site will behave…I hope it is relatively solid.

I put the page up just because it seemed like a natural thing to do given the data that I’ve been collecting (relatively large amounts of twitter-generated info).  I’m hoping that it might, even a little bit, encourage the conference attendees to think of twitter as they listen, chat, etc.

handling hashtags

Twitter hashtags are a great tool for improvised info organization–i.e. using software features to marshall information in ways that the feature designer didn’t think up (and made no pretense of thinking up).  In particular, I’ve been thinking of hashtags as a hack to support collaborative IR.  Need to research the size of Google Scholar’s index?  Mark relevant resources with, say, #search.GSsize .  Others interested in the same topic could add to the body of knowledge related to the search, and could follow the collected resources.

Of course this is what hashtags are for, so I’m not proposing anything very new here.

But this idea got me thinking of a few services that would support hashtag use for collaborative IR:

  1. Intelligent search for tags
  2. hashtag disambiguation.

Other services like recommendation also leap to mind.

By intelligent search, I’m thinking of a way to find tags that are relevant to a particular topic.  hashtags.org/tags already collects tags.  But as far as I know (please correct me if I’m wrong) existing hashtag search simply supports string matches.  It’s difficult to find semantically useful tags. This would frustrate any kind of real collaborative use of them.

As for hashtag disambiguation, I simply mean trying to identify and separate different semantic uses of the same tag character string.  The admirably ungoverned nature of hashtags naturally leads to collisions.  For example #ir primarily yields tweets related to Iran; not what I had in mind.

Another example: I’m an amateur (VERY amateur) painter with a particular interest in paintings mediums.  Too lazy to type #paintingMedium on my phone (I’m not alone in this, I see), I’m inclined to tag things with #medium, which tosses my lot in with scads of information on the TV show.  These collisions aren’t a problem as I organize my own posts, but they would be if people wanted to search broadly for useful tags, jumping onto a tag in medias res.

What I’m suggesting is that it would be useful and interesting to tackle the complexity of hashtags in efforts to extend their utility.  A first step here would be to analyze the text that accompanies them.  But I suspect this wouldn’t be enough.  Would we need to consider the social structure in which tags are embedded?  I sense an opportunity here.