Size of Google Scholar’s Index?

I’m writing a semi-detailed blog post to counter some recent arguments about the quality of data in Google Scholar.  I don’t have much stake in defending Google here, but I’ve seen some egregious straw man arguments and vacuous statistics bandied around.

To make this argument compelling, though, it would help to have a rough idea of how many documents the Google Scholar index contains.  I twittered about this yesterday, but thought this venue would have a wider reach.

Any comments on the matter would be most helpful.  Even suggestions on the order of magnitude would probably be sufficient.

If the size of the index isn’t obvious, maybe others have ideas about how to estimate it.


3 Comments on “Size of Google Scholar’s Index?”

  1. Jon says:

    well, there’s a bit of federated search literature about estimating collection sizes. see for a start

  2. Fred says:

    Assuming the index size isn’t obvious, this is actually an interesting challenge. This article might be of use:

    Ferguson, D. A. (2009). Name-Based Cluster Sampling. Sociological Methods & Research, 37(4), 590-598.

    The drawback of having “name” as PSU is that it doesn’t
    reflect the “globality” of the corpus. I wonder if there’s some GSS data that could be used so you could use Country/Region as your PSU (and then within that sample Names).

    I don’t know how you feel about Heckman, but this might be a case where such a correction would be useful if you’re going to model estimates, but I’m honestly not sure how names distribute…seems exponential.

  3. […] particular, I’ve been thinking of hashtags as a hack to support collaborative IR.  Need to research the size of Google Scholar’s index?  Mark relevant resources with, say, #search.GSsize .  Others interested in the same topic could […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s