Snowball sampling for Twitter Research

By way of shameless promotion, I am currently encouraging people to help me evaluate an experimental IR system that searches microblog (Twitter) data.  To participate, please see:

http://tacoma.lis.illinois.edu:8080/sparrow

Please consider giving it a moment…even a brief moment.

Now, onto a more substantive matter: I’ve been wrestling with the validity of testing an IR system (particularly a microblog IR system) using a so-called snowball sampling technique.  For the uninitiated, snowball sampling involves recruiting a small number of people to participate in a study with the explicit aim that they will, in turn, encourage others to participate.  The hope is that participation in the study will extend beyond the narrow initial sample as subjects recruit new participants.

Snowball sampling has clear drawbacks.  Most obviously, it is ripe for introducing bias into one’s analysis.  The initial “seed” participants will drive the demographics of subsequent recruits.  This effect could amplify any initial bias.  The non-random (assuming it is non-random) selection of initial participants, and their non-random selection of recruits calls into question the application of standard inferential statistics at the end of the study.  What status does a confidence interval on, say, user satisfaction derived from a snowball sample have with respect to the level of user satisfaction in the population?

However, snowball sampling has its merits, too.  Among these is the possibility of obtaining a reasonable number of participants in the absence of a tractable method for random sampling.

In my case, I have decided that a snowball sample for this study is worth the risks it entails.  In order to avoid poisoning my results, I’ll keep description of the project to a minimum.

But I feel comfortable saying that my method of recruiting includes dissemination of a call for participation in several venues:

  • Via a twitter post with a call for readers to retweet it.
  • Via this blog post!
  • By email to two mailing lists (one a student list, and the other a list of Twitter researchers).

In this case, the value of a snowball sample extends beyond simply acquiring a large N. The fact that Twitter users are connected by Twitter’s native subscription model suggests to me that the fact that my sample will draw many users who are “close” to my social network is not a liability.  Instead it will, I hope, lend a level of realism to how a particular sub-community functions.

One problem with these rose-colored lenses is that I have no way to characterize this sub-community formally.  Inferences drawn from this sample may generalize to some group.  But what group is that?

Obviously some of the validity of this sample will have to do with the nature of the data collected and the research questions to be posed against it, neither of which I’m comfortable discussing yet.  But I would be interested to hear what readers think: does snowball sampling have merits or liabilities for research on the use of systems that inherently rely on social connections that do not pertain to situations lacking explicit social linkage?

Advertisements

8 Comments on “Snowball sampling for Twitter Research”

  1. andrea forte says:

    Some random thoughts…

    I’m not sure that retweeting exactly constitutes snowball sampling, I almost think it needs another name. I recently did a twitter survey of teachers using this method – by my estimate it reached upwards of 20,000 twitter accounts. Hell of a snowball. It was retweeted by some folks with a massive following, but who knows how many eyes actually saw it. I got 1200 hits to my survey. 40 completed surveys. I’m calling that a 3% response rate but how do I know what these clicks represent?? Maybe only 80 of them actually fit the criteria I was looking for (K-12 teachers who actively use Twitter) and I have a 50% response rate?

    In any case, my personal feeling is that I don’t think that the people who completed the survey or participated in interviews were consistently “close” to my network but I am struggling with the same problem you mention here. What DO these people represent? Since this was intended to be an exploratory qualitative survey, I’m not fretting much, I got some valuable insights, but I’m interested in hearing what conclusions you draw.

    • milesefron says:

      Thanks for the feedback, Andrea. I agree that the method you (and I) are describing is likely to extend beyond the neighborhood of the researcher’s social circle, provided one gets a critical mass of RTs.

      Since we can easily track RTs, maybe there would be a way to characterize the group of people for whom the call was visible. That is, whose eyes might have see the call? If we could do that satisfactorily, we might be able to induce some knowledge of the bias we’ve incurred. e.g. What’s the probability that this person would have seen my call given the seed sample vs. the estimated probability that he or she would have seen it given a “random” set of initial seeds? By taking a bootstrap approach (i.e. replicating the random seeding many times), we might be able to characterize how much different our actual sample is from a sample that would emerge from a putatively random initial condition… that’s a lot of “mights”, though!

      Certainly more details to follow.

  2. Ian Soboroff says:

    How are you going to validate the method? Do you have a control sample planned?

    Also, have you controlled the search task at all? How will you know if the sessions are comparable?

    • milesefron says:

      Ian,

      The answer to your second question is no. This portion of the experiment is wide open. And you’re right, this fact does confound comparison across treatments.

      Given this consideration, the first question gets a qualified yes. The experiment is running as a bucket test, with some users getting a baseline and others getting a treatment. But since, as you point out, it’s difficult to compare across conditions, I need to be judicious in drawing conclusions.

      What I think is most important is that this portion of the study is one among several modes of assessment. The others provide direct comparison, but lack the realism obtainable by having people interact with the system as they see fit. My goal is to show the merits and liabilities of various approaches to the problem at hand under several lenses. This lens is intended to obtain the most naturalistic data among the various methods.

  3. mantruc says:

    Miles,

    A few months ago I ran a survey with similar sampling methods. One of the things that ran to my favor is that I was targeting a population for which people on my network or somehow similar to me were a good fit. I included some indicators to check if they were fitting this description and demographics to get a better sense of who they were. In retrospect I would have placed a couple of more demographic markers.

    In two weeks I passed the 1200 responses and later analysis of data show that I did actually hit the type of people I wanted. There’s a good display of my sample’s bias in the country of the participants, but still I believe ball did fall quite far from just my friends… just looking at the people who tweeted it and posted on blogs, the age range, etc.

    One thing I learned that may be useful to you is that straightforward RTs are not the only way people will share your link; Some will mention your @name and others won’t. Additionally some people will create a new shorthand URL for sharing the link (my guess is they want to track their own influence). Other people may post it on blogs, and send via email.

    Anyway, good luck with your research! Hope this helps.

    • milesefron says:

      Thanks. Yes, in my previous reply to Andrea’s comment, I have have oversimplified the process of tracking the spread of the call. But it’s nice to hear that overall you were pleased with your results. It sounds like you applied some good ingenuity to the problem.

  4. Suh-hee says:

    As there are a lot of survey done on Twitter, I guess there would be some academic papers regarding this issue. I wonder if you’ve seen some previous studies about sampling on Twitter.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s