An Example-Based Mapping Method for Text Categorization and Retrieval

Research output: Contribution to journalArticlepeer-review

269 Scopus citations

Abstract

A unified model for text categorization and text retrieval is introduced. We use a training set of manually categorized documents to learn word-category associations, and use these associations to predict the categories of arbitrary documents. Similarly, we use a training set of queries and their related documents to obtain empirical associations between query words and indexing terms of documents, and use these associations to predict the related documents of arbitrary queries. A Linear Least Squares Fit 1994 technique is employed to estimate the likelihood of these associations. Document collections from the MEDLINE database and Mayo patient records are used for studies on the effectiveness of our approach, and on how much the effectiveness depends on the choices of training data, indexing language, word-weighting scheme, and morphological canonicalization. Alternative methods are also tested on these data collections for comparison. It is evident that the LLSF approach uses the relevance information effectively within human decisions of categorization and retrieval, and achieves a semantic mapping of free texts to their representations in an indexing language. Such a semantic mapping lead to a significant improvement in categorization and retrieval, compared to alternative approaches.

Original languageEnglish (US)
Pages (from-to)252-277
Number of pages26
JournalACM Transactions on Information Systems (TOIS)
Volume12
Issue number3
DOIs
StatePublished - Jan 7 1994
Externally publishedYes

Keywords

  • document categorization
  • query categorization
  • statistical learning of human decisions

ASJC Scopus subject areas

  • Information Systems
  • General Business, Management and Accounting
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'An Example-Based Mapping Method for Text Categorization and Retrieval'. Together they form a unique fingerprint.

Cite this