Term Frequency - Inverse Document Frequency (tf-idf)

Old technique (classical).

Term Frequency - Inverse Document Frequency (TF-IDF) is a widely used statistical method in natural language processing and information retrieval. It measures how important a term is within a document relative to a collection of documents (i.e., relative to a corpus).

  • – weight (relevance) of term in document
  • – number of occurrences of term in document
  • – number of documents containing term
  • – total number of documents

The relevance of a document is the sum of the values.