meaningful text analysisPosted: October 1, 2009
Last night I had dinner with a group of visiting scholars from Germany who are part of the textGrid project. Textgrid entails an effort to bring grid computing to bear on digital humanities research. We spent the evening talking not so much about grid technologies but rather about humanities computing in general. The conversation also focused on the Monk project, with which our gracious host John Unsworth is closely involved.
The thrust of our discussion lay in what computing does, can, should, and cannot offer to the study of humanistic data.
The interesting question is, what should humanities computing be?
Kirsten Uszkalo was especially keen on the application of sentiment analysis to the work she does on early modern English literature. But I wonder whether the already-difficult problem of identifying, say, positive and negative product reviews isn’t qualitatively different in the context of 16th Century popular lit.
Consider one example that we discussed: reports of demonic possession. It struck me that a humanist is unlikely to be compelled by a classifier that achieves n% accuracy in separating medical and theological treatments of possession. Instead, the interesting question in this case lies in identifying the textual features that would enable such a classifier. That is what aspects of a text–vocabulary, physical dimensions, print type, etc.–speak to a meaningful difference in discourses?
I came away from the dinner wondering where the problem of feature creation, selection, reduction, etc. fits into humanities computing. To what extent is feature selection a computing problem at all? Maybe the features that would inform a classifier are the aim of the humanist in the first place.