handling hashtags

Twitter hashtags are a great tool for improvised info organization–i.e. using software features to marshall information in ways that the feature designer didn’t think up (and made no pretense of thinking up).  In particular, I’ve been thinking of hashtags as a hack to support collaborative IR.  Need to research the size of Google Scholar’s index?  Mark relevant resources with, say, #search.GSsize .  Others interested in the same topic could add to the body of knowledge related to the search, and could follow the collected resources.

Of course this is what hashtags are for, so I’m not proposing anything very new here.

But this idea got me thinking of a few services that would support hashtag use for collaborative IR:

  1. Intelligent search for tags
  2. hashtag disambiguation.

Other services like recommendation also leap to mind.

By intelligent search, I’m thinking of a way to find tags that are relevant to a particular topic.  hashtags.org/tags already collects tags.  But as far as I know (please correct me if I’m wrong) existing hashtag search simply supports string matches.  It’s difficult to find semantically useful tags. This would frustrate any kind of real collaborative use of them.

As for hashtag disambiguation, I simply mean trying to identify and separate different semantic uses of the same tag character string.  The admirably ungoverned nature of hashtags naturally leads to collisions.  For example #ir primarily yields tweets related to Iran; not what I had in mind.

Another example: I’m an amateur (VERY amateur) painter with a particular interest in paintings mediums.  Too lazy to type #paintingMedium on my phone (I’m not alone in this, I see), I’m inclined to tag things with #medium, which tosses my lot in with scads of information on the TV show.  These collisions aren’t a problem as I organize my own posts, but they would be if people wanted to search broadly for useful tags, jumping onto a tag in medias res.

What I’m suggesting is that it would be useful and interesting to tackle the complexity of hashtags in efforts to extend their utility.  A first step here would be to analyze the text that accompanies them.  But I suspect this wouldn’t be enough.  Would we need to consider the social structure in which tags are embedded?  I sense an opportunity here.


6 Comments on “handling hashtags”

  1. Richard says:


    Have you seen What the Hastag?!? http://wthashtag.com
    This service lets you define what a hashtag means, disambiguate different hashtags and tracks their usage.

    Other services like http://tagal.us do the same. I’m not sure I’ve seen a head-to-head comparison of all these kinds of services, but I expect there are more out there. I also don’t know to what extent these rely on human-powered/social vs. algorithmic disambiguation.

    • milesefron says:

      Thanks for these… these sites seem to be doing something close to what I was thinking about, but not quite the same thing. If you look up #music on wthashtag.com you do indeed get information on that tag. They do also give you a recommended related tag (only one?).

      But what I don’t see here is the ability to find tags on a particular topic. If I type “collaborative information retrieval” into their search box, I don’t get any sensible results.

      Clearly, though, people are triangulating on this issue. Thanks again.

  2. whashtag.com relies on manually-edited entries for hashtags. Brizzly.com does the same thing. This is the standard wikipedia crowd-sourcing model: works well for popular hashtags, doesn’t help with others.

    Does anyone else see the irony of having to use text to describe tags?

  3. By the way, Miles, I think we’re on the same wavelength again: See Searching twitter

    • milesefron says:

      Interesting idea, inverse tag frequency. I bet calculating IDF for tags would yield a lot of noise. Also, the local weight, the TF weight, is much different in the case of tweets. Obviously no issue of document length normalization here. Do ‘important’ words occur often in a given tweet? I suspect not. Your point about trending terms seems spot on; this might need to take the place of both TF and IDF.

  4. Jon says:

    Miles — great ideas & sounds like a fun research project. My hunch is that, for automatic disambiguation methods, you would want to calculate TF/IDF-type metrics over a user’s whole collection (or recent collection) of tweets, rather than in an individual tweet. Its less likely that a single user would use the same hashtag in ambiguous ways, and high frequency use across tweets is certainly significant. Some sort of “smoothing” with the immediate social network to alleviate sparsity issues and reinforce the intended sense of a hashtag would be nice to look into.

    Then, of course, you’re getting into lots of length normalization issues, and everything else we deal with in other forms of automatic text processing.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s