A comparative study of supervised learning as applied to acronym expansion in clinical reports.

Mahesh Joshi, Serguei Pakhomov, Ted Pedersen, Christopher Chute

Research output: Contribution to journalArticle

Abstract

Electronic medical records (EMR) constitute a valuable resource of patient specific information and are increasingly used for clinical practice and research. Acronyms present a challenge to retrieving information from the EMR because many acronyms are ambiguous with respect to their full form. In this paper we perform a comparative study of supervised acronym disambiguation in a corpus of clinical notes, using three machine learning algorithms: the naïve Bayes classifier, decision trees and Support Vector Machines (SVMs). Our training features include part-of-speech tags, unigrams and bigrams in the context of the ambiguous acronym. We find that the combination of these feature types results in consistently better accuracy than when they are used individually, regardless of the learning algorithm employed. The accuracy of all three methods when using all features consistently approaches or exceeds 90%, even when the baseline majority classifier is below 50%.

Original languageEnglish (US)
Pages (from-to)399-403
Number of pages5
JournalAMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium
StatePublished - 2006
Externally publishedYes

Fingerprint

Electronic Health Records
Learning
Decision Trees
Research
Support Vector Machine
Machine Learning

ASJC Scopus subject areas

  • Medicine(all)

Cite this

A comparative study of supervised learning as applied to acronym expansion in clinical reports. / Joshi, Mahesh; Pakhomov, Serguei; Pedersen, Ted; Chute, Christopher.

In: AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium, 2006, p. 399-403.

Research output: Contribution to journalArticle

@article{662a7c26dd1142c69e26a03b88d55366,
title = "A comparative study of supervised learning as applied to acronym expansion in clinical reports.",
abstract = "Electronic medical records (EMR) constitute a valuable resource of patient specific information and are increasingly used for clinical practice and research. Acronyms present a challenge to retrieving information from the EMR because many acronyms are ambiguous with respect to their full form. In this paper we perform a comparative study of supervised acronym disambiguation in a corpus of clinical notes, using three machine learning algorithms: the na{\"i}ve Bayes classifier, decision trees and Support Vector Machines (SVMs). Our training features include part-of-speech tags, unigrams and bigrams in the context of the ambiguous acronym. We find that the combination of these feature types results in consistently better accuracy than when they are used individually, regardless of the learning algorithm employed. The accuracy of all three methods when using all features consistently approaches or exceeds 90{\%}, even when the baseline majority classifier is below 50{\%}.",
author = "Mahesh Joshi and Serguei Pakhomov and Ted Pedersen and Christopher Chute",
year = "2006",
language = "English (US)",
pages = "399--403",
journal = "AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium",
issn = "1559-4076",
publisher = "American Medical Informatics Association",

}

TY - JOUR

T1 - A comparative study of supervised learning as applied to acronym expansion in clinical reports.

AU - Joshi, Mahesh

AU - Pakhomov, Serguei

AU - Pedersen, Ted

AU - Chute, Christopher

PY - 2006

Y1 - 2006

N2 - Electronic medical records (EMR) constitute a valuable resource of patient specific information and are increasingly used for clinical practice and research. Acronyms present a challenge to retrieving information from the EMR because many acronyms are ambiguous with respect to their full form. In this paper we perform a comparative study of supervised acronym disambiguation in a corpus of clinical notes, using three machine learning algorithms: the naïve Bayes classifier, decision trees and Support Vector Machines (SVMs). Our training features include part-of-speech tags, unigrams and bigrams in the context of the ambiguous acronym. We find that the combination of these feature types results in consistently better accuracy than when they are used individually, regardless of the learning algorithm employed. The accuracy of all three methods when using all features consistently approaches or exceeds 90%, even when the baseline majority classifier is below 50%.

AB - Electronic medical records (EMR) constitute a valuable resource of patient specific information and are increasingly used for clinical practice and research. Acronyms present a challenge to retrieving information from the EMR because many acronyms are ambiguous with respect to their full form. In this paper we perform a comparative study of supervised acronym disambiguation in a corpus of clinical notes, using three machine learning algorithms: the naïve Bayes classifier, decision trees and Support Vector Machines (SVMs). Our training features include part-of-speech tags, unigrams and bigrams in the context of the ambiguous acronym. We find that the combination of these feature types results in consistently better accuracy than when they are used individually, regardless of the learning algorithm employed. The accuracy of all three methods when using all features consistently approaches or exceeds 90%, even when the baseline majority classifier is below 50%.

UR - http://www.scopus.com/inward/record.url?scp=34748874639&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34748874639&partnerID=8YFLogxK

M3 - Article

C2 - 17238371

AN - SCOPUS:34748874639

SP - 399

EP - 403

JO - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

JF - AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium

SN - 1559-4076

ER -