Using compund codes for automatic classification of clinical diagnoses

Serguei V. Pakhomov; James D. Buntrock; Christopher G. Chute

doi:10.3233/978-1-60750-949-3-411

Using compund codes for automatic classification of clinical diagnoses

Serguei V. Pakhomov, James D. Buntrock, Christopher G. Chute

Research output: Contribution to journal › Article › peer-review

6 Scopus citations

Abstract

Classification of diagnoses (a.k.a. coding) is the central part of current concept based medical IR systems. Some classification systems contain over 30, 000 distinct codes which makes classifying clinical documents a time consuming labor intensive and error prone process. This paper presents a simple methodology for cleaning up and reusing existing manually coded diagnostic statements mainly extracted from clinical notes to build predictive models using a sparse feature implementation of a Naïve Bayes classifier. One of the problems addressed is that diagnostic statements often contain several diagnoses and are assigned several codes resulting in a 'many-to-many' mapping problem. We investigate one possible way of solving this problem by introducing compound (multiple code) categories. We present experimental results of classifying >16,000 randomly selected diagnostic strings into 19 top level categories. A small improvement (3%) with using compound categories over simple categories indicates that using multiple code categories is a promising solution, although clearly in need of further research and refinement.

Original language	English (US)
Pages (from-to)	411-415
Number of pages	5
Journal	Studies in health technology and informatics
Volume	107
DOIs	https://doi.org/10.3233/978-1-60750-949-3-411
State	Published - 2004
Externally published	Yes

Keywords

Automatic classification
clinical diagnoses
concept indexing

ASJC Scopus subject areas

Biomedical Engineering
Health Informatics
Health Information Management

Access to Document

10.3233/978-1-60750-949-3-411

Cite this

@article{ef093c4b4d9a424e8c5f9a6f822de843,

title = "Using compund codes for automatic classification of clinical diagnoses",

abstract = "Classification of diagnoses (a.k.a. coding) is the central part of current concept based medical IR systems. Some classification systems contain over 30, 000 distinct codes which makes classifying clinical documents a time consuming labor intensive and error prone process. This paper presents a simple methodology for cleaning up and reusing existing manually coded diagnostic statements mainly extracted from clinical notes to build predictive models using a sparse feature implementation of a Na{\"i}ve Bayes classifier. One of the problems addressed is that diagnostic statements often contain several diagnoses and are assigned several codes resulting in a 'many-to-many' mapping problem. We investigate one possible way of solving this problem by introducing compound (multiple code) categories. We present experimental results of classifying >16,000 randomly selected diagnostic strings into 19 top level categories. A small improvement (3%) with using compound categories over simple categories indicates that using multiple code categories is a promising solution, although clearly in need of further research and refinement.",

keywords = "Automatic classification, clinical diagnoses, concept indexing",

author = "Pakhomov, {Serguei V.} and Buntrock, {James D.} and Chute, {Christopher G.}",

note = "Funding Information: We'd like to thank Barbara Abbot, Deborah Albrecht and Pauline Funk for sharing their HICDA coding expertise as well as Ted Pedersen for his advice on training automatic classifiers.",

year = "2004",

doi = "10.3233/978-1-60750-949-3-411",

language = "English (US)",

volume = "107",

pages = "411--415",

journal = "Studies in health technology and informatics",

issn = "0926-9630",

publisher = "IOS Press",

}

TY - JOUR

T1 - Using compund codes for automatic classification of clinical diagnoses

AU - Pakhomov, Serguei V.

AU - Buntrock, James D.

AU - Chute, Christopher G.

N1 - Funding Information: We'd like to thank Barbara Abbot, Deborah Albrecht and Pauline Funk for sharing their HICDA coding expertise as well as Ted Pedersen for his advice on training automatic classifiers.

PY - 2004

Y1 - 2004

N2 - Classification of diagnoses (a.k.a. coding) is the central part of current concept based medical IR systems. Some classification systems contain over 30, 000 distinct codes which makes classifying clinical documents a time consuming labor intensive and error prone process. This paper presents a simple methodology for cleaning up and reusing existing manually coded diagnostic statements mainly extracted from clinical notes to build predictive models using a sparse feature implementation of a Naïve Bayes classifier. One of the problems addressed is that diagnostic statements often contain several diagnoses and are assigned several codes resulting in a 'many-to-many' mapping problem. We investigate one possible way of solving this problem by introducing compound (multiple code) categories. We present experimental results of classifying >16,000 randomly selected diagnostic strings into 19 top level categories. A small improvement (3%) with using compound categories over simple categories indicates that using multiple code categories is a promising solution, although clearly in need of further research and refinement.

AB - Classification of diagnoses (a.k.a. coding) is the central part of current concept based medical IR systems. Some classification systems contain over 30, 000 distinct codes which makes classifying clinical documents a time consuming labor intensive and error prone process. This paper presents a simple methodology for cleaning up and reusing existing manually coded diagnostic statements mainly extracted from clinical notes to build predictive models using a sparse feature implementation of a Naïve Bayes classifier. One of the problems addressed is that diagnostic statements often contain several diagnoses and are assigned several codes resulting in a 'many-to-many' mapping problem. We investigate one possible way of solving this problem by introducing compound (multiple code) categories. We present experimental results of classifying >16,000 randomly selected diagnostic strings into 19 top level categories. A small improvement (3%) with using compound categories over simple categories indicates that using multiple code categories is a promising solution, although clearly in need of further research and refinement.

KW - Automatic classification

KW - clinical diagnoses

KW - concept indexing

UR - http://www.scopus.com/inward/record.url?scp=77955512211&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955512211&partnerID=8YFLogxK

U2 - 10.3233/978-1-60750-949-3-411

DO - 10.3233/978-1-60750-949-3-411

M3 - Article

C2 - 15360845

AN - SCOPUS:77955512211

SN - 0926-9630

VL - 107

SP - 411

EP - 415

JO - Studies in health technology and informatics

JF - Studies in health technology and informatics

ER -

Using compund codes for automatic classification of clinical diagnoses

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this