Relevance data for language models using maximum likelihood

David Bodoff; Bin Wu; K. Y.Michael Wong

doi:10.1002/asi.10300

Relevance data for language models using maximum likelihood

David Bodoff, Bin Wu, K. Y.Michael Wong

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

We present a preliminary empirical test of a maximum likelihood approach to using relevance data for training information retrieval (IR) parameters. Similar to language models, our method uses explicitly hypothesized distributions for documents and queries, but we add to this an explicitly hypothesized distribution for relevance judgments. The method unifies document-oriented and query-oriented views. Performance is better than the Rocchio heuristic for document and/or query modification. The maximum likelihood methodology also motivates a heuristic estimate of the MLE optimization. The method can be used to test competing hypotheses regarding the processes of authors' term selection, searchers' term selection, and assessors' relevancy judgments.

Original language	English (US)
Pages (from-to)	1050-1061
Number of pages	12
Journal	Journal of the American Society for Information Science and Technology
Volume	54
Issue number	11
DOIs	https://doi.org/10.1002/asi.10300
State	Published - Sep 2003
Externally published	Yes

ASJC Scopus subject areas

Software
Information Systems
Human-Computer Interaction
Computer Networks and Communications
Artificial Intelligence

Access to Document

10.1002/asi.10300

Cite this

@article{9bad86d217f941b7985ba9670c5afaa8,

title = "Relevance data for language models using maximum likelihood",

abstract = "We present a preliminary empirical test of a maximum likelihood approach to using relevance data for training information retrieval (IR) parameters. Similar to language models, our method uses explicitly hypothesized distributions for documents and queries, but we add to this an explicitly hypothesized distribution for relevance judgments. The method unifies document-oriented and query-oriented views. Performance is better than the Rocchio heuristic for document and/or query modification. The maximum likelihood methodology also motivates a heuristic estimate of the MLE optimization. The method can be used to test competing hypotheses regarding the processes of authors' term selection, searchers' term selection, and assessors' relevancy judgments.",

author = "David Bodoff and Bin Wu and Wong, {K. Y.Michael}",

year = "2003",

month = sep,

doi = "10.1002/asi.10300",

language = "English (US)",

volume = "54",

pages = "1050--1061",

journal = "Journal of the American Society for Information Science and Technology",

issn = "1532-2882",

publisher = "John Wiley and Sons Ltd",

number = "11",

}

TY - JOUR

T1 - Relevance data for language models using maximum likelihood

AU - Bodoff, David

AU - Wu, Bin

AU - Wong, K. Y.Michael

PY - 2003/9

Y1 - 2003/9

N2 - We present a preliminary empirical test of a maximum likelihood approach to using relevance data for training information retrieval (IR) parameters. Similar to language models, our method uses explicitly hypothesized distributions for documents and queries, but we add to this an explicitly hypothesized distribution for relevance judgments. The method unifies document-oriented and query-oriented views. Performance is better than the Rocchio heuristic for document and/or query modification. The maximum likelihood methodology also motivates a heuristic estimate of the MLE optimization. The method can be used to test competing hypotheses regarding the processes of authors' term selection, searchers' term selection, and assessors' relevancy judgments.

AB - We present a preliminary empirical test of a maximum likelihood approach to using relevance data for training information retrieval (IR) parameters. Similar to language models, our method uses explicitly hypothesized distributions for documents and queries, but we add to this an explicitly hypothesized distribution for relevance judgments. The method unifies document-oriented and query-oriented views. Performance is better than the Rocchio heuristic for document and/or query modification. The maximum likelihood methodology also motivates a heuristic estimate of the MLE optimization. The method can be used to test competing hypotheses regarding the processes of authors' term selection, searchers' term selection, and assessors' relevancy judgments.

UR - http://www.scopus.com/inward/record.url?scp=0041328369&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0041328369&partnerID=8YFLogxK

U2 - 10.1002/asi.10300

DO - 10.1002/asi.10300

M3 - Article

AN - SCOPUS:0041328369

SN - 1532-2882

VL - 54

SP - 1050

EP - 1061

JO - Journal of the American Society for Information Science and Technology

JF - Journal of the American Society for Information Science and Technology

IS - 11

ER -

Relevance data for language models using maximum likelihood

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this