TY - JOUR
T1 - Using compund codes for automatic classification of clinical diagnoses
AU - Pakhomov, Serguei V.
AU - Buntrock, James D.
AU - Chute, Christopher G.
N1 - Funding Information:
We'd like to thank Barbara Abbot, Deborah Albrecht and Pauline Funk for sharing their HICDA coding expertise as well as Ted Pedersen for his advice on training automatic classifiers.
PY - 2004
Y1 - 2004
N2 - Classification of diagnoses (a.k.a. coding) is the central part of current concept based medical IR systems. Some classification systems contain over 30, 000 distinct codes which makes classifying clinical documents a time consuming labor intensive and error prone process. This paper presents a simple methodology for cleaning up and reusing existing manually coded diagnostic statements mainly extracted from clinical notes to build predictive models using a sparse feature implementation of a Naïve Bayes classifier. One of the problems addressed is that diagnostic statements often contain several diagnoses and are assigned several codes resulting in a 'many-to-many' mapping problem. We investigate one possible way of solving this problem by introducing compound (multiple code) categories. We present experimental results of classifying >16,000 randomly selected diagnostic strings into 19 top level categories. A small improvement (3%) with using compound categories over simple categories indicates that using multiple code categories is a promising solution, although clearly in need of further research and refinement.
AB - Classification of diagnoses (a.k.a. coding) is the central part of current concept based medical IR systems. Some classification systems contain over 30, 000 distinct codes which makes classifying clinical documents a time consuming labor intensive and error prone process. This paper presents a simple methodology for cleaning up and reusing existing manually coded diagnostic statements mainly extracted from clinical notes to build predictive models using a sparse feature implementation of a Naïve Bayes classifier. One of the problems addressed is that diagnostic statements often contain several diagnoses and are assigned several codes resulting in a 'many-to-many' mapping problem. We investigate one possible way of solving this problem by introducing compound (multiple code) categories. We present experimental results of classifying >16,000 randomly selected diagnostic strings into 19 top level categories. A small improvement (3%) with using compound categories over simple categories indicates that using multiple code categories is a promising solution, although clearly in need of further research and refinement.
KW - Automatic classification
KW - clinical diagnoses
KW - concept indexing
UR - http://www.scopus.com/inward/record.url?scp=77955512211&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955512211&partnerID=8YFLogxK
U2 - 10.3233/978-1-60750-949-3-411
DO - 10.3233/978-1-60750-949-3-411
M3 - Article
C2 - 15360845
AN - SCOPUS:77955512211
SN - 0926-9630
VL - 107
SP - 411
EP - 415
JO - Studies in health technology and informatics
JF - Studies in health technology and informatics
ER -