A randomized controlled trial of concept based indexing of Web page content.

P. L. Elkin, A. Ruggieri, L. Bergstrom, B. A. Bauer, M. Lee, P. V. Ogren, Christopher Chute

Research output: Contribution to journalArticle

Abstract

OBJECTIVE: Medical information is increasingly being presented in a web-enabled format. Medical journals, guidelines, and textbooks are all accessible in a web-based format. It would be desirable to link these reference sources to the electronic medical record to provide education, to facilitate guideline implementation and usage and for decision support. In order for these rich information sources to be accessed via the medical record they will need to be indexed by a single comparable underlying reference terminology. METHODS: We took a random sample of 100 web pages out of the 6,000 web pages on the Mayo Clinic's Health Oasis web site. The web pages were divided into four datasets each containing 25 pages. These were humanly reviewed by four clinicians to identify all of the health concepts present (R1DA, R2DB, R3DC, R4DD). The web pages were simultaneously indexed using the SNOMED-RT beta release. The indexing engine has been previously described and validated. A new clinician reviewed the indexed web pages to determine the accuracy of the automated mappings as compared with the human identified concepts (R4DA, R3DB, R2DC, R1DD). RESULTS: This review found 13,220 health concepts. Of these 10,383 concepts were identified by the initial human review (78.5% +/- 3.6%). The automated process identified 10,083 concepts correctly (76.3% +/- 4.0%) from within this corpus. The computer identified 2,420 concepts, which were not identified by the clinician's review but were upon further consideration important to include as health concepts. There was on average a 17.1% +/- 3.5% variability in the human reviewers ability to identify the important health concepts within web page content. Concept Based Indexing provided a positive predictive value (PPV) of finding a health concept of 79.3% as compared with keyword indexing which only has a PPV of 33.7% (p <0.001). CONCLUSION: SNOMED-RT is a reasonable ontology for web page indexing. Concept based indexing provides a significantly greater accuracy in identifying health concepts when compared with keyword indexing.

Original languageEnglish (US)
Pages (from-to)220-224
Number of pages5
JournalProceedings / AMIA ... Annual Symposium. AMIA Symposium
StatePublished - 2000
Externally publishedYes

Fingerprint

Randomized Controlled Trials
Health
Systematized Nomenclature of Medicine
Guidelines
Textbooks
Electronic Health Records
Terminology
Medical Records
Education

Cite this

Elkin, P. L., Ruggieri, A., Bergstrom, L., Bauer, B. A., Lee, M., Ogren, P. V., & Chute, C. (2000). A randomized controlled trial of concept based indexing of Web page content. Proceedings / AMIA ... Annual Symposium. AMIA Symposium, 220-224.

A randomized controlled trial of concept based indexing of Web page content. / Elkin, P. L.; Ruggieri, A.; Bergstrom, L.; Bauer, B. A.; Lee, M.; Ogren, P. V.; Chute, Christopher.

In: Proceedings / AMIA ... Annual Symposium. AMIA Symposium, 2000, p. 220-224.

Research output: Contribution to journalArticle

Elkin, P. L. ; Ruggieri, A. ; Bergstrom, L. ; Bauer, B. A. ; Lee, M. ; Ogren, P. V. ; Chute, Christopher. / A randomized controlled trial of concept based indexing of Web page content. In: Proceedings / AMIA ... Annual Symposium. AMIA Symposium. 2000 ; pp. 220-224.
@article{8c9bb0842798459fb1ab77cb919154bb,
title = "A randomized controlled trial of concept based indexing of Web page content.",
abstract = "OBJECTIVE: Medical information is increasingly being presented in a web-enabled format. Medical journals, guidelines, and textbooks are all accessible in a web-based format. It would be desirable to link these reference sources to the electronic medical record to provide education, to facilitate guideline implementation and usage and for decision support. In order for these rich information sources to be accessed via the medical record they will need to be indexed by a single comparable underlying reference terminology. METHODS: We took a random sample of 100 web pages out of the 6,000 web pages on the Mayo Clinic's Health Oasis web site. The web pages were divided into four datasets each containing 25 pages. These were humanly reviewed by four clinicians to identify all of the health concepts present (R1DA, R2DB, R3DC, R4DD). The web pages were simultaneously indexed using the SNOMED-RT beta release. The indexing engine has been previously described and validated. A new clinician reviewed the indexed web pages to determine the accuracy of the automated mappings as compared with the human identified concepts (R4DA, R3DB, R2DC, R1DD). RESULTS: This review found 13,220 health concepts. Of these 10,383 concepts were identified by the initial human review (78.5{\%} +/- 3.6{\%}). The automated process identified 10,083 concepts correctly (76.3{\%} +/- 4.0{\%}) from within this corpus. The computer identified 2,420 concepts, which were not identified by the clinician's review but were upon further consideration important to include as health concepts. There was on average a 17.1{\%} +/- 3.5{\%} variability in the human reviewers ability to identify the important health concepts within web page content. Concept Based Indexing provided a positive predictive value (PPV) of finding a health concept of 79.3{\%} as compared with keyword indexing which only has a PPV of 33.7{\%} (p <0.001). CONCLUSION: SNOMED-RT is a reasonable ontology for web page indexing. Concept based indexing provides a significantly greater accuracy in identifying health concepts when compared with keyword indexing.",
author = "Elkin, {P. L.} and A. Ruggieri and L. Bergstrom and Bauer, {B. A.} and M. Lee and Ogren, {P. V.} and Christopher Chute",
year = "2000",
language = "English (US)",
pages = "220--224",
journal = "Proceedings / AMIA . Annual Symposium. AMIA Symposium",
issn = "1531-605X",
publisher = "Hanley & Belfus",

}

TY - JOUR

T1 - A randomized controlled trial of concept based indexing of Web page content.

AU - Elkin, P. L.

AU - Ruggieri, A.

AU - Bergstrom, L.

AU - Bauer, B. A.

AU - Lee, M.

AU - Ogren, P. V.

AU - Chute, Christopher

PY - 2000

Y1 - 2000

N2 - OBJECTIVE: Medical information is increasingly being presented in a web-enabled format. Medical journals, guidelines, and textbooks are all accessible in a web-based format. It would be desirable to link these reference sources to the electronic medical record to provide education, to facilitate guideline implementation and usage and for decision support. In order for these rich information sources to be accessed via the medical record they will need to be indexed by a single comparable underlying reference terminology. METHODS: We took a random sample of 100 web pages out of the 6,000 web pages on the Mayo Clinic's Health Oasis web site. The web pages were divided into four datasets each containing 25 pages. These were humanly reviewed by four clinicians to identify all of the health concepts present (R1DA, R2DB, R3DC, R4DD). The web pages were simultaneously indexed using the SNOMED-RT beta release. The indexing engine has been previously described and validated. A new clinician reviewed the indexed web pages to determine the accuracy of the automated mappings as compared with the human identified concepts (R4DA, R3DB, R2DC, R1DD). RESULTS: This review found 13,220 health concepts. Of these 10,383 concepts were identified by the initial human review (78.5% +/- 3.6%). The automated process identified 10,083 concepts correctly (76.3% +/- 4.0%) from within this corpus. The computer identified 2,420 concepts, which were not identified by the clinician's review but were upon further consideration important to include as health concepts. There was on average a 17.1% +/- 3.5% variability in the human reviewers ability to identify the important health concepts within web page content. Concept Based Indexing provided a positive predictive value (PPV) of finding a health concept of 79.3% as compared with keyword indexing which only has a PPV of 33.7% (p <0.001). CONCLUSION: SNOMED-RT is a reasonable ontology for web page indexing. Concept based indexing provides a significantly greater accuracy in identifying health concepts when compared with keyword indexing.

AB - OBJECTIVE: Medical information is increasingly being presented in a web-enabled format. Medical journals, guidelines, and textbooks are all accessible in a web-based format. It would be desirable to link these reference sources to the electronic medical record to provide education, to facilitate guideline implementation and usage and for decision support. In order for these rich information sources to be accessed via the medical record they will need to be indexed by a single comparable underlying reference terminology. METHODS: We took a random sample of 100 web pages out of the 6,000 web pages on the Mayo Clinic's Health Oasis web site. The web pages were divided into four datasets each containing 25 pages. These were humanly reviewed by four clinicians to identify all of the health concepts present (R1DA, R2DB, R3DC, R4DD). The web pages were simultaneously indexed using the SNOMED-RT beta release. The indexing engine has been previously described and validated. A new clinician reviewed the indexed web pages to determine the accuracy of the automated mappings as compared with the human identified concepts (R4DA, R3DB, R2DC, R1DD). RESULTS: This review found 13,220 health concepts. Of these 10,383 concepts were identified by the initial human review (78.5% +/- 3.6%). The automated process identified 10,083 concepts correctly (76.3% +/- 4.0%) from within this corpus. The computer identified 2,420 concepts, which were not identified by the clinician's review but were upon further consideration important to include as health concepts. There was on average a 17.1% +/- 3.5% variability in the human reviewers ability to identify the important health concepts within web page content. Concept Based Indexing provided a positive predictive value (PPV) of finding a health concept of 79.3% as compared with keyword indexing which only has a PPV of 33.7% (p <0.001). CONCLUSION: SNOMED-RT is a reasonable ontology for web page indexing. Concept based indexing provides a significantly greater accuracy in identifying health concepts when compared with keyword indexing.

UR - http://www.scopus.com/inward/record.url?scp=0034566845&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034566845&partnerID=8YFLogxK

M3 - Article

SP - 220

EP - 224

JO - Proceedings / AMIA . Annual Symposium. AMIA Symposium

JF - Proceedings / AMIA . Annual Symposium. AMIA Symposium

SN - 1531-605X

ER -