Use of computerized algorithm to identify individuals in need of testing for celiac disease

Jonas F. Ludvigsson, Jyotishman Pathak, Sean Murphy, Matthew Durski, Phillip S. Kirsch, Christopher Chute, Euijung Ryu, Joseph A. Murray

Research output: Contribution to journalArticle

Abstract

Background and aim: Celiac disease (CD) is a lifelong immune-mediated disease with excess mortality. Early diagnosis is important to minimize disease symptoms, complications, and consumption of healthcare resources. Most patients remain undiagnosed. We developed two electronic medical record (EMR)-based algorithms to identify patients at high risk of CD and in need of CD screening. Methods: (I) Using natural language processing (NLP), we searched EMRs for 16 free text (and related) terms in 216 CD patients and 280 controls. (II) EMRs were also searched for ICD9 (International Classification of Disease) codes suggesting an increased risk of CD in 202 patients with CD and 524 controls. For each approach, we determined the optimal number of hits to be assigned as CD cases. To assess performance of these algorithms, sensitivity and specificity were calculated. Results: Using two hits as the cut-off, the NLP algorithm identified 72.9% of all celiac patients (sensitivity), and ruled out CD in 89.9% of the controls (specificity). In a representative US population of individuals without a prior celiac diagnosis (assuming that 0.6% had undiagnosed CD), this NLP algorithm could identify a group of individuals where 4.2% would have CD (positive predictive value). ICD9 code search using three hits as the cut-off had a sensitivity of 17.1% and a specificity of 88.5% (positive predictive value was 0.9%). Discussion and conclusions: This study shows that computerized EMR-based algorithms can help identify patients at high risk of CD. NLP-based techniques demonstrate higher sensitivity and positive predictive values than algorithms based on ICD9 code searches.

Original languageEnglish (US)
JournalJournal of the American Medical Informatics Association
Volume20
Issue numberE2
DOIs
StatePublished - 2013
Externally publishedYes

Fingerprint

Celiac Disease
Natural Language Processing
Electronic Health Records
Abdomen
Computerized Medical Records Systems
Immune System Diseases
International Classification of Diseases
Early Diagnosis
Delivery of Health Care
Sensitivity and Specificity

ASJC Scopus subject areas

  • Health Informatics

Cite this

Use of computerized algorithm to identify individuals in need of testing for celiac disease. / Ludvigsson, Jonas F.; Pathak, Jyotishman; Murphy, Sean; Durski, Matthew; Kirsch, Phillip S.; Chute, Christopher; Ryu, Euijung; Murray, Joseph A.

In: Journal of the American Medical Informatics Association, Vol. 20, No. E2, 2013.

Research output: Contribution to journalArticle

Ludvigsson, Jonas F. ; Pathak, Jyotishman ; Murphy, Sean ; Durski, Matthew ; Kirsch, Phillip S. ; Chute, Christopher ; Ryu, Euijung ; Murray, Joseph A. / Use of computerized algorithm to identify individuals in need of testing for celiac disease. In: Journal of the American Medical Informatics Association. 2013 ; Vol. 20, No. E2.
@article{dcd3fc2b96f04207b3ea3e693129298c,
title = "Use of computerized algorithm to identify individuals in need of testing for celiac disease",
abstract = "Background and aim: Celiac disease (CD) is a lifelong immune-mediated disease with excess mortality. Early diagnosis is important to minimize disease symptoms, complications, and consumption of healthcare resources. Most patients remain undiagnosed. We developed two electronic medical record (EMR)-based algorithms to identify patients at high risk of CD and in need of CD screening. Methods: (I) Using natural language processing (NLP), we searched EMRs for 16 free text (and related) terms in 216 CD patients and 280 controls. (II) EMRs were also searched for ICD9 (International Classification of Disease) codes suggesting an increased risk of CD in 202 patients with CD and 524 controls. For each approach, we determined the optimal number of hits to be assigned as CD cases. To assess performance of these algorithms, sensitivity and specificity were calculated. Results: Using two hits as the cut-off, the NLP algorithm identified 72.9{\%} of all celiac patients (sensitivity), and ruled out CD in 89.9{\%} of the controls (specificity). In a representative US population of individuals without a prior celiac diagnosis (assuming that 0.6{\%} had undiagnosed CD), this NLP algorithm could identify a group of individuals where 4.2{\%} would have CD (positive predictive value). ICD9 code search using three hits as the cut-off had a sensitivity of 17.1{\%} and a specificity of 88.5{\%} (positive predictive value was 0.9{\%}). Discussion and conclusions: This study shows that computerized EMR-based algorithms can help identify patients at high risk of CD. NLP-based techniques demonstrate higher sensitivity and positive predictive values than algorithms based on ICD9 code searches.",
author = "Ludvigsson, {Jonas F.} and Jyotishman Pathak and Sean Murphy and Matthew Durski and Kirsch, {Phillip S.} and Christopher Chute and Euijung Ryu and Murray, {Joseph A.}",
year = "2013",
doi = "10.1136/amiajnl-2013-001924",
language = "English (US)",
volume = "20",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "E2",

}

TY - JOUR

T1 - Use of computerized algorithm to identify individuals in need of testing for celiac disease

AU - Ludvigsson, Jonas F.

AU - Pathak, Jyotishman

AU - Murphy, Sean

AU - Durski, Matthew

AU - Kirsch, Phillip S.

AU - Chute, Christopher

AU - Ryu, Euijung

AU - Murray, Joseph A.

PY - 2013

Y1 - 2013

N2 - Background and aim: Celiac disease (CD) is a lifelong immune-mediated disease with excess mortality. Early diagnosis is important to minimize disease symptoms, complications, and consumption of healthcare resources. Most patients remain undiagnosed. We developed two electronic medical record (EMR)-based algorithms to identify patients at high risk of CD and in need of CD screening. Methods: (I) Using natural language processing (NLP), we searched EMRs for 16 free text (and related) terms in 216 CD patients and 280 controls. (II) EMRs were also searched for ICD9 (International Classification of Disease) codes suggesting an increased risk of CD in 202 patients with CD and 524 controls. For each approach, we determined the optimal number of hits to be assigned as CD cases. To assess performance of these algorithms, sensitivity and specificity were calculated. Results: Using two hits as the cut-off, the NLP algorithm identified 72.9% of all celiac patients (sensitivity), and ruled out CD in 89.9% of the controls (specificity). In a representative US population of individuals without a prior celiac diagnosis (assuming that 0.6% had undiagnosed CD), this NLP algorithm could identify a group of individuals where 4.2% would have CD (positive predictive value). ICD9 code search using three hits as the cut-off had a sensitivity of 17.1% and a specificity of 88.5% (positive predictive value was 0.9%). Discussion and conclusions: This study shows that computerized EMR-based algorithms can help identify patients at high risk of CD. NLP-based techniques demonstrate higher sensitivity and positive predictive values than algorithms based on ICD9 code searches.

AB - Background and aim: Celiac disease (CD) is a lifelong immune-mediated disease with excess mortality. Early diagnosis is important to minimize disease symptoms, complications, and consumption of healthcare resources. Most patients remain undiagnosed. We developed two electronic medical record (EMR)-based algorithms to identify patients at high risk of CD and in need of CD screening. Methods: (I) Using natural language processing (NLP), we searched EMRs for 16 free text (and related) terms in 216 CD patients and 280 controls. (II) EMRs were also searched for ICD9 (International Classification of Disease) codes suggesting an increased risk of CD in 202 patients with CD and 524 controls. For each approach, we determined the optimal number of hits to be assigned as CD cases. To assess performance of these algorithms, sensitivity and specificity were calculated. Results: Using two hits as the cut-off, the NLP algorithm identified 72.9% of all celiac patients (sensitivity), and ruled out CD in 89.9% of the controls (specificity). In a representative US population of individuals without a prior celiac diagnosis (assuming that 0.6% had undiagnosed CD), this NLP algorithm could identify a group of individuals where 4.2% would have CD (positive predictive value). ICD9 code search using three hits as the cut-off had a sensitivity of 17.1% and a specificity of 88.5% (positive predictive value was 0.9%). Discussion and conclusions: This study shows that computerized EMR-based algorithms can help identify patients at high risk of CD. NLP-based techniques demonstrate higher sensitivity and positive predictive values than algorithms based on ICD9 code searches.

UR - http://www.scopus.com/inward/record.url?scp=84890404302&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890404302&partnerID=8YFLogxK

U2 - 10.1136/amiajnl-2013-001924

DO - 10.1136/amiajnl-2013-001924

M3 - Article

C2 - 23956016

AN - SCOPUS:84890404302

VL - 20

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - E2

ER -