Inverted Index

Learned in CS451.

A posting is a record that associates a specific word (or term) with a set of documents in which that word appears.

Terminology:

  • Inverted Index: maps terms → documents.
  • Forward Index: maps documents → terms. (Seems strange, so a book’s index is “inverted”?)

  • so the word is simply a mapped to in which document it is in

Postings size follows Zipf’s Law:

  • A few terms occur very frequently
  • Many terms occur very infrequently
  • Like the 80-20 Rule

Instead of using that structure, use a linked list.