A randomized controlled trial of concept based indexing of Web page content.

P. L. Elkin, A. Ruggieri, L. Bergstrom, B. A. Bauer, M. Lee, P. V. Ogren, Christopher Chute

Research output: Contribution to journalArticle

Abstract

OBJECTIVE: Medical information is increasingly being presented in a web-enabled format. Medical journals, guidelines, and textbooks are all accessible in a web-based format. It would be desirable to link these reference sources to the electronic medical record to provide education, to facilitate guideline implementation and usage and for decision support. In order for these rich information sources to be accessed via the medical record they will need to be indexed by a single comparable underlying reference terminology. METHODS: We took a random sample of 100 web pages out of the 6,000 web pages on the Mayo Clinic's Health Oasis web site. The web pages were divided into four datasets each containing 25 pages. These were humanly reviewed by four clinicians to identify all of the health concepts present (R1DA, R2DB, R3DC, R4DD). The web pages were simultaneously indexed using the SNOMED-RT beta release. The indexing engine has been previously described and validated. A new clinician reviewed the indexed web pages to determine the accuracy of the automated mappings as compared with the human identified concepts (R4DA, R3DB, R2DC, R1DD). RESULTS: This review found 13,220 health concepts. Of these 10,383 concepts were identified by the initial human review (78.5% +/- 3.6%). The automated process identified 10,083 concepts correctly (76.3% +/- 4.0%) from within this corpus. The computer identified 2,420 concepts, which were not identified by the clinician's review but were upon further consideration important to include as health concepts. There was on average a 17.1% +/- 3.5% variability in the human reviewers ability to identify the important health concepts within web page content. Concept Based Indexing provided a positive predictive value (PPV) of finding a health concept of 79.3% as compared with keyword indexing which only has a PPV of 33.7% (p <0.001). CONCLUSION: SNOMED-RT is a reasonable ontology for web page indexing. Concept based indexing provides a significantly greater accuracy in identifying health concepts when compared with keyword indexing.

Original languageEnglish (US)
Pages (from-to)220-224
Number of pages5
JournalProceedings / AMIA ... Annual Symposium. AMIA Symposium
Publication statusPublished - 2000
Externally publishedYes

    Fingerprint

Cite this

Elkin, P. L., Ruggieri, A., Bergstrom, L., Bauer, B. A., Lee, M., Ogren, P. V., & Chute, C. (2000). A randomized controlled trial of concept based indexing of Web page content. Proceedings / AMIA ... Annual Symposium. AMIA Symposium, 220-224.