This morning the New York Times ran an article describing a newly minted approach by researchers at U. of Washington for analyzing the process of amino acid folding. A more thorough discussion appears in Nature.
To avoid the burdensome statistical modeling that is the norm in the field (and about which I confess to near no knowledge) the researchers developed Foldit a game inviting amateurs to help in this work. Quoting the Times on Foldit:
The game, which was competitive and offered the puzzle-solving qualities of a game like Rubik’s Cube, quickly attracted a dedicated following of thousands of players.
In other words, the researchers crowdsourced the problem by posing the work as a game. Aside from sidestepping heavy computation, the researchers found that Foldit led to a level of accuracy on a par with established methods.
With a lot of current interest in crowdsourcing for IR (see Omar Alonso’s slides from an ECIR tutorial and proceedings from the SIGIR2010 crowdsourcing workshop), the article begs the question (for me): what retrieval work could be approached in this way? Of course this question isn’t new; cf. Google’s image labeler. But it’s still resonant.
Gathering relevance judgment via Mechanical Turk is an obvious place for crowdsourcing to enter IR research. But this type of crowdsourcing is qualitatively different than what we see in Foldit: participants are paid to do work that presumably they otherwise wouldn’t do. Not only is this model limiting–it’s also ripe for people to fudge the process in order to get paid without completing the task appropriately.
Opening research problems to crowds in the form of games strikes me as a way to mitigate the problem of people gaming (sorry for the pun) the system. The approach might also help us expand the scope of problems that can be aided by crowdsourcing. In Foldit, user interaction is abstracted in a way that makes it difficult for people to cheat. Andwithout a work/payment model, there’s little incentive to do the job poorly. Most striking, though: Foldit has broken a tremendously complex problem into sub-problems whose solutions make plausible entertainment.
What IR problems lend themselves to this kind of crowdsourcing? The image labeler is certainly one example, though I personally found it about as fun as waiting in an airport terminal for a delayed flight.
What else could we do in this space?
To start what I hope might become a discussion, I’ll offer a few criteria that I think a compelling crowdsourcing game should meet:
- Instant feedback: the game should give information to the player at all times. A real-time display of a performance-based score might do the trick.
- Abandonment & restarting: players should be able to quit or start a game at any time while still making their participation useful.
- Level of difficulty: obviously the game should be neither too hard nor too easy to be enjoyable. Better yet, let the player should choose his or her preferred level of challenge (e.g. work on a larger or smaller part of the problem).
- Manageable chunks of work: Foldit operates by presenting the player with ‘puzzles.’ These are scenarios that involve solving a well-defined problem such as freeing atoms of moving a chain from an unsuitable location to a better spot. Each of these problems is solvable and discrete.
Of course this list of only the sketchiest effort. I’m curious if others have more and better ideas. And of course the real question is how all this can be made to work in IR settings. What problems in IR lend themselves to this kind of solution? If we identify such problems, how do we transform the work into a viable ‘game’ that people would undertake voluntarily and to good effect?
I’m writing a semi-detailed blog post to counter some recent arguments about the quality of data in Google Scholar. I don’t have much stake in defending Google here, but I’ve seen some egregious straw man arguments and vacuous statistics bandied around.
To make this argument compelling, though, it would help to have a rough idea of how many documents the Google Scholar index contains. I twittered about this yesterday, but thought this venue would have a wider reach.
Any comments on the matter would be most helpful. Even suggestions on the order of magnitude would probably be sufficient.
If the size of the index isn’t obvious, maybe others have ideas about how to estimate it.
I just came back from a talk in the stat department. The speaker was Brad Efron (yes, he’s my dad). The title of the talk was “The future of indirect evidence.” A proto-paper version is available in PDF.
The talk concerned some very specific points of relationship and deviation between frequentist and Bayesian statistics. It’s too reductive to say that the talk tried to marry them, though there was some flavor of that, especially in the context of empirical Bayes methods. But I think it’s accurate to say that Brad argued that the kind of information that we usually think of in terms of Bayesian priors is not anathema to frequentist methods. His umbrella term for this is ‘indirect evidence.’
As an example, he offered this graph:
This is a standard result from classical statistics: fitting a linear regression to a sample. Brad argued, however, that despite its obvious frequentism, analysis of this kind does rely on indirect evidence. That is, even here we’re bringing belief (though not strictly prior belief) to prediction.
In his example, we wish to predict the kidney function of a 55 year-old. The red dot indicates the score of the lone 55 year-old in the study. An analysis based on only direct evidence would thus use his score as the prediction. But of course statisticians are more comfortable with the prediction that lies on the regression line. Thus the canonical prediction for a 55 year-old relies on evidence only indirectly related to the kidney function of a 55 year-old.
I’ve not done the topic justice. But the reason I’ve labored over the point is that the thrust of the talk applied immediately to IR. Brad argued that classical statistics was developed in the 19th and 20th centuries for data that was common in those eras. Now data of high dimensionality and tremendous sample sizes is common–IR certainly falls into this camp.
The challenge, we were told, was that contemporary data sets make indirect evidence unignorable. Bayesian approaches offer a response to this problem, but not the only response. In particular, the matter of empirical Bayes strikes me as uniquely suited to IR.
In a future post I plan to consider how an empirical Bayes approach would apply to a common problem in IR: smoothing a language model. I think that this simple task is a good starting point for this analysis.
I’ve been reading many blog posts recently on HCIR and cognate problems, due in no small part to the upcoming HCIR conference and the CFP for the 2nd Workshop on collaborative IR. But a really clear, high-level articulation of the key factors in HCIR are laid out in Daniel Tunkelang‘s new piece in the ASIST Bulletin, “Reconsidering Relevance and Embracing Interaction.”
Besides a compelling overview of HCIR’s motivations (especially wrt the problematic status of relevance in many IR settings), Daniel offers three hallmarks of HCIR, at least if HCIR is done well. Systems, Tunkelang suggests, should strive for:
- transparency: Communicate why the retrieved documents retrieved.
- control: Allow the searcher to express (and revise) his or her information need in a way that bears directly on what’s communicated through the transparency mechanisms.
- guidance: Shepherd searchers through the process of translating information needs into tractable queries.
Of course Daniel’s essay does a better job of describing these imperatives than I have done here. Check it out.