API search

Checking out the recently opened infochimps API reminded me of an issue that has been on my research backburner, but also on my radar for a while.  Here’s the question: given the proliferating number of web app APIs available to developers, would it be useful to build a search service that helps people find / explore available APIs?  I think it would.

Maybe such a thing already exists; trying to find it with a web search along the lines of api search is unhelpful for obvious reasons.  I’d be curious if any readers could offer pointers.

What I’m envisioning is a service that supports some hybrid of search and browsing with the aim of helping developers find APIs that provide access to data, functions, etc. that will help in completing a particular task.

For instance, I’m working on a system that helps people manage the searches they perform over time, across systems, on various devices, etc.  My feeling is that peoples’ experience with IR systems creates information that is often lost.  Some previous research of course addresses this issue… but that is for another post.

This is a pretty nebulous idea, and it’s not obvious to me what data and services are available to support its development.

I’d like to have access to a system that helps me explore:

  • data sets exposed via publicly available APIs.
  • APIs or API functions x,y, and z that are similar to some API function a.
  • restrictions / terms of service for possibly useful APIs (e.g. rate limits, attribution, re-ranking of search results).
  • API documentation.
  • Libraries available for working with APIs.

Aside from the practical value of such a system, I think API retrieval raises interesting research questions and problem areas.  For instance, what kinds of queries (or for that matter what kinds of information needs) can we expect to deal with in this arena? What factors make a particular unit of retrieval relevant? What features of these units offer traction in pursuit of retrieval? What kind of crawling / data acquisition steps do we need to address to move this agenda forward?

I suspect that addressing these problems is as much an HCI problem as it is a core IR issue.  Presenting germane information about APIs in a consistent fashion that allows for comparison and coherent exploration strikes me as a tall challenge.


IRB and (social) information research

Having returned from CHI and the CHI2010 microblog research workshop, I’m jazzed–new problems to tackle, studies to run.  In other words, the conference did just what it should; it gave me ideas for new research projects.

One of these projects is time-sensitive (I can’t go into detail because doing so will bias the results.  More on that later.)  As they put it on the Twitter search page, it’s what’s happening right now.  More seriously, the questions need to run within a few days of CHI’s end.  But the study will involve asking real people a few questions.  For a researcher at a university, this means that I must get human subjects approval from my local institutional review board (IRB).

It’s easy to kvetch about IRB’s.  See the Chronicle of Higher Ed’s piece, Do IRB’s Go Overboard? In fact, I’ve found the IRB officers at my institution to be extremely helpful, so I’m not going to kvetch (thinking of strategic ways of posing IRB applications recently led me to the very interesting IRB Review Blog that offers nuanced, substantive reflections on the subject).

As anyone who has sat through a university’s research ethics training knows, IRB’s were created in the wake of several odious and damaging studies.  This motivation is clear and impeccable.

But for those of us working in research related to information use, especially in domains such as IR, HCI, and social informatics broadly construed, the risk of damage or exploitation of subjects is often (though not always; privacy issues can be problematic) minimal.

But more interestingly, I think our work challenges the basic model that underpins contemporary research practice in the university.

My point in writing this post, is not to argue that we should occupy a rarefied, unsupervised domain.  But recently I’ve dealt with several particular matters that suggest that research on information behavior (mostly HCIR work) pushes some matters to the fore that I think will soon be more general.  The following is a brief list.  I invite elaborations or arguments.

  1. crowd-sourced studies.  Services like Amazon’s Mechanical Turk offer a huge opening for IR research, as an upcoming SIGIR workshop makes clear.  What is the status of turkers with respect to human subjects approval?  In a future post I’ll describe in detail my own experience shepherding an MTurk-based study through university approval channels.
  2. search log analysis.  This isn’t a new problem wrt to IRB, and it definitely does raise issues of privacy.  But I wonder where more broadly informed studies of user behavior fit into this picture.  As an example, I was recently given permission to use a set of query logs without human subjects approval.  These logs were already in existence; I got them from a third party.  However, in a new study I want to collect logs from my own system.  Initial interaction with IRB led to the decision that this work must go through the application process.  Likewise, clickthrough data raised red flags.
  3. real-time user studies.  As I mentioned above, I’m in a situation where I need to collect information (essentially survey data) from Twitter users now.  Until very recently the subject of this “survey” didn’t exist, and it won’t exist in any meaningful sense for long.  I anticipate that this issue will be common for me, and perhaps for others.

Again, my point in writing this is not to say that I should have carte blanche to do research outside of normal channels.  What I am saying is twofold:

  1. Research on information interactions is pushing the limits of the current human subjects/IRB model used by most universities.  This is evidenced by unpredictable judgments on the status of projects.
  2. I think the community of researchers in “our” areas would do well to consider strategies for approaching IRB and other institutional hurdles.  We don’t want to game the system.  But I think the way we describe the work we do has an impact on the status of that work.  If current models are going to change, it would be great if we could (by our interactions with relevant officers) influence those changes in a positive way.