Bayes, Fisher and indirect evidencePosted: October 15, 2009
I just came back from a talk in the stat department. The speaker was Brad Efron (yes, he’s my dad). The title of the talk was “The future of indirect evidence.” A proto-paper version is available in PDF.
The talk concerned some very specific points of relationship and deviation between frequentist and Bayesian statistics. It’s too reductive to say that the talk tried to marry them, though there was some flavor of that, especially in the context of empirical Bayes methods. But I think it’s accurate to say that Brad argued that the kind of information that we usually think of in terms of Bayesian priors is not anathema to frequentist methods. His umbrella term for this is ‘indirect evidence.’
As an example, he offered this graph:
This is a standard result from classical statistics: fitting a linear regression to a sample. Brad argued, however, that despite its obvious frequentism, analysis of this kind does rely on indirect evidence. That is, even here we’re bringing belief (though not strictly prior belief) to prediction.
In his example, we wish to predict the kidney function of a 55 year-old. The red dot indicates the score of the lone 55 year-old in the study. An analysis based on only direct evidence would thus use his score as the prediction. But of course statisticians are more comfortable with the prediction that lies on the regression line. Thus the canonical prediction for a 55 year-old relies on evidence only indirectly related to the kidney function of a 55 year-old.
I’ve not done the topic justice. But the reason I’ve labored over the point is that the thrust of the talk applied immediately to IR. Brad argued that classical statistics was developed in the 19th and 20th centuries for data that was common in those eras. Now data of high dimensionality and tremendous sample sizes is common–IR certainly falls into this camp.
The challenge, we were told, was that contemporary data sets make indirect evidence unignorable. Bayesian approaches offer a response to this problem, but not the only response. In particular, the matter of empirical Bayes strikes me as uniquely suited to IR.
In a future post I plan to consider how an empirical Bayes approach would apply to a common problem in IR: smoothing a language model. I think that this simple task is a good starting point for this analysis.