,hl=en,siteUrl='http://0ldfox.blogspot.com/',authuser=0,security_token="v_SeT2Tv8vVdKRCcG9CCW-ZdIfQ:1429878696275"/> Old Fox KM Journal

Friday, December 03, 2004

Searching


Precision, Ranking, and Recall - the Holy Trinity

In talking about search engines and how to improve them, it helps to remember what distinguishes a useful search from a fruitless one. To be truly useful, there are generally three things we want from a search engine:

We want it to give us all of the relevant information available on our topic.
We want it to give us only information that is relevant to our search
We want the information ordered in some meaningful way, so that we see the most relevant results first.

The first of these criteria - getting all of the relevant information available - is called recall. Without good recall, we have no guarantee that valid, interesting results won't be left out of our result set. We want the rate of false negatives - relevant results that we never see - to be as low as possible.

The second criterion - the proportion of documents in our result set that is relevant to our search - is called precision. With too little precision, our useful results get diluted by irrelevancies, and we are left with the task of sifting through a large set of documents to find what we want. High precision means the lowest possible rate of false positives.

There is an inevitable tradeoff between precision and recall. Search results generally lie on a continuum of relevancy, so there is no distinct place where relevant results stop and extraneous ones begin. The wider we cast our net, the less precise our result set becomes. This is why the third criterion, ranking, is so important. Ranking has to do with whether the result set is ordered in a way that matches our intuitive understanding of what is more and what is less relevant. Of course the concept of 'relevance' depends heavily on our own immediate needs, our interests, and the context of our search. In an ideal world, search engines would learn our individual preferences so well that they could fine-tune any search we made based on our past expressed interests and pecadilloes. In the real world, a useful ranking is anything that does a reasonable job distinguishing between strong and weak results. ...

Good paper here.

No comments: