Cranfield

First ran into this reading Ranked Retrieval. I was a little confused on how this works, but this is my best writeup below:

The Cranfield collection includes relevance judgments, i.e.:

  • For each query (e.g. “aircraft fatigue”), human assessors read the documents and decided which ones were relevant.

These judgments are published along with the dataset (basically, a big list of query–doc pairs where doc is marked “relevant”).

So if we want for the term “fatigue”, query “aircraft fatigue”:

  • Look at all docs containing “fatigue” → that gives n.
  • Look at how many of those docs are in the relevance list for the query → that gives