A comparative study of supervised learning as applied to acronym expansion in clinical reports.

Mahesh Joshi, Serguei Pakhomov, Ted Pedersen, Christopher G. Chute

Research output: Contribution to journalArticle

Abstract

Electronic medical records (EMR) constitute a valuable resource of patient specific information and are increasingly used for clinical practice and research. Acronyms present a challenge to retrieving information from the EMR because many acronyms are ambiguous with respect to their full form. In this paper we perform a comparative study of supervised acronym disambiguation in a corpus of clinical notes, using three machine learning algorithms: the naïve Bayes classifier, decision trees and Support Vector Machines (SVMs). Our training features include part-of-speech tags, unigrams and bigrams in the context of the ambiguous acronym. We find that the combination of these feature types results in consistently better accuracy than when they are used individually, regardless of the learning algorithm employed. The accuracy of all three methods when using all features consistently approaches or exceeds 90%, even when the baseline majority classifier is below 50%.

Original languageEnglish (US)
Pages (from-to)399-403
Number of pages5
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
StatePublished - 2006
Externally publishedYes

ASJC Scopus subject areas

  • Medicine(all)

Fingerprint Dive into the research topics of 'A comparative study of supervised learning as applied to acronym expansion in clinical reports.'. Together they form a unique fingerprint.

  • Cite this