TY - JOUR
T1 - Words or concepts
T2 - the features of indexing units and their optimal use in information retrieval.
AU - Yang, Y.
AU - Chute, C. G.
N1 - Copyright:
This record is sourced from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine
PY - 1993
Y1 - 1993
N2 - Words or Concepts, which are a better choice for indexing the contents of documents? The answer depends on what method is used for retrieval. This paper studies the effects of using canonical concepts versus document words in different retrieval systems with a testing collection of MEDLINE documents. In our tests, for a retrieval system which does not use any human knowledge, using words yielded better retrieval results, while using concepts suffered from a vocabulary difference between canonical expressions of concepts and non-canonical words in queries or documents. For a system which depends on the UMLS synonym set for a mapping from queries or documents to canonical concepts, the retrieval results were slightly better than the case of not using the synonyms, but still worse than the systems using words. For the systems which automatically "learn" empirical connections between words and concepts from examples in the testing collection, the vocabulary problem was effectively solved, and the results of using concepts were competitive or better, compared to those using words.
AB - Words or Concepts, which are a better choice for indexing the contents of documents? The answer depends on what method is used for retrieval. This paper studies the effects of using canonical concepts versus document words in different retrieval systems with a testing collection of MEDLINE documents. In our tests, for a retrieval system which does not use any human knowledge, using words yielded better retrieval results, while using concepts suffered from a vocabulary difference between canonical expressions of concepts and non-canonical words in queries or documents. For a system which depends on the UMLS synonym set for a mapping from queries or documents to canonical concepts, the retrieval results were slightly better than the case of not using the synonyms, but still worse than the systems using words. For the systems which automatically "learn" empirical connections between words and concepts from examples in the testing collection, the vocabulary problem was effectively solved, and the results of using concepts were competitive or better, compared to those using words.
UR - http://www.scopus.com/inward/record.url?scp=0027764790&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0027764790&partnerID=8YFLogxK
M3 - Article
C2 - 8130562
AN - SCOPUS:0027764790
SN - 0195-4210
SP - 685
EP - 689
JO - Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care
JF - Proceedings / the ... Annual Symposium on Computer Application [sic] in Medical Care. Symposium on Computer Applications in Medical Care
ER -