Ranking retrieval systems without relevance judgments

I. Soboroff; C. Nicholas; P. Cahan

doi:10.1145/383952.383961

Ranking retrieval systems without relevance judgments

I. Soboroff, C. Nicholas, P. Cahan

Research output: Contribution to journal › Conference article › peer-review

160 Scopus citations

Abstract

The most prevalent experimental methodology for comparing the effectiveness of information retrieval systems requires a test collection, composed of a set of documents, a set of query topics, and a set of relevance judgments indicating which documents are relevant to which topics. It is well known that relevance judgments are not infallible, but recent retrospective investigation into results from the Text REtrieval Conference (TREC) has shown that differences in human judgments of relevance do not affect the relative measured performance of retrieval systems. Based on this result, we propose and describe the initial results of a new evaluation methodology which replaces human relevance judgments with a randomly selected mapping of documents to topics which we refer to as pseudo-relevance judgments. Rankings of systems with our methodology, correlate positively with official TREC rankings, although the performance of the top systems is not predicted well. The correlations are stable over a variety of pool depths and sampling techniques. With improvements, such a methodology could be useful in evaluating systems such as World-Wide Web search engines, where the set of documents changes too often to make traditional collection construction techniques practical.

Original language	English (US)
Pages (from-to)	66-73
Number of pages	8
Journal	SIGIR Forum (ACM Special Interest Group on Information Retrieval)
DOIs	https://doi.org/10.1145/383952.383961
State	Published - 2001
Externally published	Yes
Event	24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - New Orleans, LA, United States Duration: Sep 9 2001 → Sep 13 2001

ASJC Scopus subject areas

Management Information Systems
Hardware and Architecture

Access to Document

10.1145/383952.383961

Cite this

@article{4481829984ef45e1b11c38dfad9e67b3,

title = "Ranking retrieval systems without relevance judgments",

abstract = "The most prevalent experimental methodology for comparing the effectiveness of information retrieval systems requires a test collection, composed of a set of documents, a set of query topics, and a set of relevance judgments indicating which documents are relevant to which topics. It is well known that relevance judgments are not infallible, but recent retrospective investigation into results from the Text REtrieval Conference (TREC) has shown that differences in human judgments of relevance do not affect the relative measured performance of retrieval systems. Based on this result, we propose and describe the initial results of a new evaluation methodology which replaces human relevance judgments with a randomly selected mapping of documents to topics which we refer to as pseudo-relevance judgments. Rankings of systems with our methodology, correlate positively with official TREC rankings, although the performance of the top systems is not predicted well. The correlations are stable over a variety of pool depths and sampling techniques. With improvements, such a methodology could be useful in evaluating systems such as World-Wide Web search engines, where the set of documents changes too often to make traditional collection construction techniques practical.",

author = "I. Soboroff and C. Nicholas and P. Cahan",

year = "2001",

doi = "10.1145/383952.383961",

language = "English (US)",

pages = "66--73",

journal = "SIGIR Forum (ACM Special Interest Group on Information Retrieval)",

issn = "0163-5840",

publisher = "Association for Computing Machinery (ACM)",

note = "24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval ; Conference date: 09-09-2001 Through 13-09-2001",

}

TY - JOUR

T1 - Ranking retrieval systems without relevance judgments

AU - Soboroff, I.

AU - Nicholas, C.

AU - Cahan, P.

PY - 2001

Y1 - 2001

N2 - The most prevalent experimental methodology for comparing the effectiveness of information retrieval systems requires a test collection, composed of a set of documents, a set of query topics, and a set of relevance judgments indicating which documents are relevant to which topics. It is well known that relevance judgments are not infallible, but recent retrospective investigation into results from the Text REtrieval Conference (TREC) has shown that differences in human judgments of relevance do not affect the relative measured performance of retrieval systems. Based on this result, we propose and describe the initial results of a new evaluation methodology which replaces human relevance judgments with a randomly selected mapping of documents to topics which we refer to as pseudo-relevance judgments. Rankings of systems with our methodology, correlate positively with official TREC rankings, although the performance of the top systems is not predicted well. The correlations are stable over a variety of pool depths and sampling techniques. With improvements, such a methodology could be useful in evaluating systems such as World-Wide Web search engines, where the set of documents changes too often to make traditional collection construction techniques practical.

AB - The most prevalent experimental methodology for comparing the effectiveness of information retrieval systems requires a test collection, composed of a set of documents, a set of query topics, and a set of relevance judgments indicating which documents are relevant to which topics. It is well known that relevance judgments are not infallible, but recent retrospective investigation into results from the Text REtrieval Conference (TREC) has shown that differences in human judgments of relevance do not affect the relative measured performance of retrieval systems. Based on this result, we propose and describe the initial results of a new evaluation methodology which replaces human relevance judgments with a randomly selected mapping of documents to topics which we refer to as pseudo-relevance judgments. Rankings of systems with our methodology, correlate positively with official TREC rankings, although the performance of the top systems is not predicted well. The correlations are stable over a variety of pool depths and sampling techniques. With improvements, such a methodology could be useful in evaluating systems such as World-Wide Web search engines, where the set of documents changes too often to make traditional collection construction techniques practical.

UR - http://www.scopus.com/inward/record.url?scp=0034790621&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034790621&partnerID=8YFLogxK

U2 - 10.1145/383952.383961

DO - 10.1145/383952.383961

M3 - Conference article

AN - SCOPUS:0034790621

SN - 0163-5840

SP - 66

EP - 73

JO - SIGIR Forum (ACM Special Interest Group on Information Retrieval)

JF - SIGIR Forum (ACM Special Interest Group on Information Retrieval)

T2 - 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Y2 - 9 September 2001 through 13 September 2001

ER -

Ranking retrieval systems without relevance judgments

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this