Term Frequency - Inverse Document Frequency (tf-idf)
Old technique (classical).
Term Frequency - Inverse Document Frequency (TF-IDF) is a widely used statistical method in natural language processing and information retrieval. It measures how important a term is within a document relative to a collection of documents (i.e., relative to a corpus).
- – weight (relevance) of term in document
- – number of occurrences of term in document
- – number of documents containing term
- – total number of documents
The relevance of a document is the sum of the values.