Relevance data for language models using maximum likelihood

David Bodoff, Bin Wu, K. Y.Michael Wong

Research output: Contribution to journalArticle

Abstract

We present a preliminary empirical test of a maximum likelihood approach to using relevance data for training information retrieval (IR) parameters. Similar to language models, our method uses explicitly hypothesized distributions for documents and queries, but we add to this an explicitly hypothesized distribution for relevance judgments. The method unifies document-oriented and query-oriented views. Performance is better than the Rocchio heuristic for document and/or query modification. The maximum likelihood methodology also motivates a heuristic estimate of the MLE optimization. The method can be used to test competing hypotheses regarding the processes of authors' term selection, searchers' term selection, and assessors' relevancy judgments.

Original languageEnglish (US)
Pages (from-to)1050-1061
Number of pages12
JournalJournal of the American Society for Information Science and Technology
Volume54
Issue number11
DOIs
StatePublished - Sep 1 2003
Externally publishedYes

Fingerprint

Maximum likelihood
Maximum likelihood estimation
heuristics
language
Information retrieval
information retrieval
methodology
performance
Query
Language model
Heuristics
Relevance judgments
Empirical test
Methodology

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

Relevance data for language models using maximum likelihood. / Bodoff, David; Wu, Bin; Wong, K. Y.Michael.

In: Journal of the American Society for Information Science and Technology, Vol. 54, No. 11, 01.09.2003, p. 1050-1061.

Research output: Contribution to journalArticle

@article{9bad86d217f941b7985ba9670c5afaa8,
title = "Relevance data for language models using maximum likelihood",
abstract = "We present a preliminary empirical test of a maximum likelihood approach to using relevance data for training information retrieval (IR) parameters. Similar to language models, our method uses explicitly hypothesized distributions for documents and queries, but we add to this an explicitly hypothesized distribution for relevance judgments. The method unifies document-oriented and query-oriented views. Performance is better than the Rocchio heuristic for document and/or query modification. The maximum likelihood methodology also motivates a heuristic estimate of the MLE optimization. The method can be used to test competing hypotheses regarding the processes of authors' term selection, searchers' term selection, and assessors' relevancy judgments.",
author = "David Bodoff and Bin Wu and Wong, {K. Y.Michael}",
year = "2003",
month = "9",
day = "1",
doi = "10.1002/asi.10300",
language = "English (US)",
volume = "54",
pages = "1050--1061",
journal = "Journal of the Association for Information Science and Technology",
issn = "2330-1635",
publisher = "John Wiley and Sons Ltd",
number = "11",

}

TY - JOUR

T1 - Relevance data for language models using maximum likelihood

AU - Bodoff, David

AU - Wu, Bin

AU - Wong, K. Y.Michael

PY - 2003/9/1

Y1 - 2003/9/1

N2 - We present a preliminary empirical test of a maximum likelihood approach to using relevance data for training information retrieval (IR) parameters. Similar to language models, our method uses explicitly hypothesized distributions for documents and queries, but we add to this an explicitly hypothesized distribution for relevance judgments. The method unifies document-oriented and query-oriented views. Performance is better than the Rocchio heuristic for document and/or query modification. The maximum likelihood methodology also motivates a heuristic estimate of the MLE optimization. The method can be used to test competing hypotheses regarding the processes of authors' term selection, searchers' term selection, and assessors' relevancy judgments.

AB - We present a preliminary empirical test of a maximum likelihood approach to using relevance data for training information retrieval (IR) parameters. Similar to language models, our method uses explicitly hypothesized distributions for documents and queries, but we add to this an explicitly hypothesized distribution for relevance judgments. The method unifies document-oriented and query-oriented views. Performance is better than the Rocchio heuristic for document and/or query modification. The maximum likelihood methodology also motivates a heuristic estimate of the MLE optimization. The method can be used to test competing hypotheses regarding the processes of authors' term selection, searchers' term selection, and assessors' relevancy judgments.

UR - http://www.scopus.com/inward/record.url?scp=0041328369&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0041328369&partnerID=8YFLogxK

U2 - 10.1002/asi.10300

DO - 10.1002/asi.10300

M3 - Article

AN - SCOPUS:0041328369

VL - 54

SP - 1050

EP - 1061

JO - Journal of the Association for Information Science and Technology

JF - Journal of the Association for Information Science and Technology

SN - 2330-1635

IS - 11

ER -