(function(doc, html, url) { var widget = doc.createElement("div"); widget.innerHTML = html; var script = doc.currentScript; // e = a.currentScript; if (!script) { var scripts = doc.scripts; for (var i = 0; i < scripts.length; ++i) { script = scripts[i]; if (script.src && script.src.indexOf(url) != -1) break; } } script.parentElement.replaceChild(widget, script); }(document, '

Improving the BM25 retrieval model using the frequencies of the words in the document collection

What is it about?

Most retrieval models including the popular TF-IDF and BM25 relies mainly on two statistics extracted from a document collection: 1) term frequency (TF), which is the number of times a word occurs in a document, and 2) document frequency (DF), the number of documents where a word was used. Another statistic, collection term frequency (CTF), which is the total number of times a word occurs in the collection can be used to improve these retrieval methods.

Why is it important?

BM25 is a very popular and effective retrieval method. Many attempts have been made to improve it. In the paper, we make a summary of these attempts and concluded that our approach produced significant and consistent relative improvements, which are superior to previous approaches. Gathering and using CTFs have the same costs as DFs, which makes the approach conceptually simple and easy to implement in systems already based on BM25.

Read more on Kudos…
The following have contributed to this page:
Sergio Jiménez
' ,"url"));