TF-IDF

TF-IDF is the acronym for Term Frequency - Inverse Document Frequency.

Term Frequency - Inverse Document Frequency

A commonly used technique for information retrieval in text-based documents. TF-IDF is a numerical statistic that reflects how important a word is to a document in a collection or corpus. It is based on two main factors:

  1. Term Frequency: The number of times a particular word (or term) appears in a document.
  2. Inverse Document Frequency: The logarithmically scaled inverse fraction of the number of documents in the corpus that contain the word.

The idea behind TF-IDF is that if a word appears frequently in a document, but appears in a few other documents in the corpus, then that word is likely to be important to that document. Conversely, if a word appears frequently in many documents, it is probably not very important for distinguishing between them.

The TF-IDF value for a word in a document is calculated as the product of its TF and IDF values. The resulting TF-IDF score gives a high weight to terms that are frequent in the document but rare in the corpus, and a low weight to terms that are frequent in the corpus but not in the document. This helps to identify words that are important for distinguishing between documents and for information retrieval purposes.

  • Abbreviation: TF-IDF
Back to top button
Close

Adblock Detected

Martech Zone is able to provide you this content at no cost because we monetize our site through ad revenue, affiliate links, and sponsorships. We would appreciate if you would remove your ad blocker as you view our site.