Evaluating automated approaches to anaphylaxis case classification using unstructured data from the FDA Sentinel System

Robert Ball; Sengwee Toh; Jamie Nolan; Kevin Haynes; Richard Forshee; Taxiarchis Botsis

doi:10.1002/pds.4645

Evaluating automated approaches to anaphylaxis case classification using unstructured data from the FDA Sentinel System

Robert Ball, Sengwee Toh, Jamie Nolan, Kevin Haynes, Richard Forshee, Taxiarchis Botsis

Research output: Contribution to journal › Article › peer-review

8 Scopus citations

Abstract

Introduction: In May 2008, the Food and Drug Administration launched the Sentinel Initiative, a multi-year program for the establishment of a national electronic monitoring system for medical product safety that led, in 2016, to the launch of the full Sentinel System. Under the Mini-Sentinel pilot, several algorithms for identifying health outcomes of interest, including one for anaphylaxis, were developed and evaluated using data available from the Sentinel common data model. Purpose: To evaluate whether features extracted from unstructured narrative data using natural language processing (NLP) could be used to classify anaphylaxis cases. Methods: Using previously developed methods, we extracted features from unstructured narrative data using NLP and applied rule-based and similarity-based algorithms to identify anaphylaxis among 62 potential cases previously classified by human experts as anaphylaxis (N = 33), not anaphylaxis (N = 27), and unknown (N = 2). Results: The rule-based and similarity-based approaches demonstrated almost equal performance (recall 100% vs 100%, precision 60.3% vs 57.4%, F-measure: 0.753 vs 0.729). Reasons for misclassification included the inability of the algorithms to make the same clinical judgments as human experts about the timing, severity, or presence of alternative explanations; and the identification of terms consistent with anaphylaxis but present in conditions other than anaphylaxis. Conclusions: Although precision needs to be improved before these algorithms could be used without human review, we demonstrated that applying rule-based and similarity-based algorithms to unstructured narrative information from clinical records can be used for classification of anaphylaxis in the Sentinel System. Further development and assessment of these methods in the Sentinel System are warranted.

Original language	English (US)
Pages (from-to)	1077-1084
Number of pages	8
Journal	Pharmacoepidemiology and Drug Safety
Volume	27
Issue number	10
DOIs	https://doi.org/10.1002/pds.4645
State	Published - Oct 2018
Externally published	Yes

Keywords

anaphylaxis
case classification
natural language processing
pharmacoepidemiology
sentinel system
validation

ASJC Scopus subject areas

Epidemiology
Pharmacology (medical)

Access to Document

10.1002/pds.4645

Cite this

@article{06681044577f49f8b36f7c42d38c63d7,

title = "Evaluating automated approaches to anaphylaxis case classification using unstructured data from the FDA Sentinel System",

abstract = "Introduction: In May 2008, the Food and Drug Administration launched the Sentinel Initiative, a multi-year program for the establishment of a national electronic monitoring system for medical product safety that led, in 2016, to the launch of the full Sentinel System. Under the Mini-Sentinel pilot, several algorithms for identifying health outcomes of interest, including one for anaphylaxis, were developed and evaluated using data available from the Sentinel common data model. Purpose: To evaluate whether features extracted from unstructured narrative data using natural language processing (NLP) could be used to classify anaphylaxis cases. Methods: Using previously developed methods, we extracted features from unstructured narrative data using NLP and applied rule-based and similarity-based algorithms to identify anaphylaxis among 62 potential cases previously classified by human experts as anaphylaxis (N = 33), not anaphylaxis (N = 27), and unknown (N = 2). Results: The rule-based and similarity-based approaches demonstrated almost equal performance (recall 100% vs 100%, precision 60.3% vs 57.4%, F-measure: 0.753 vs 0.729). Reasons for misclassification included the inability of the algorithms to make the same clinical judgments as human experts about the timing, severity, or presence of alternative explanations; and the identification of terms consistent with anaphylaxis but present in conditions other than anaphylaxis. Conclusions: Although precision needs to be improved before these algorithms could be used without human review, we demonstrated that applying rule-based and similarity-based algorithms to unstructured narrative information from clinical records can be used for classification of anaphylaxis in the Sentinel System. Further development and assessment of these methods in the Sentinel System are warranted.",

keywords = "anaphylaxis, case classification, natural language processing, pharmacoepidemiology, sentinel system, validation",

author = "Robert Ball and Sengwee Toh and Jamie Nolan and Kevin Haynes and Richard Forshee and Taxiarchis Botsis",

note = "Funding Information: This project and the original Mini‐Sentinel anaphylaxis project were funded under the Mini‐Sentinel task order HHSF22301012T from the US FDA. The authors would like to thank the data partners who contributed data to this project: Harvard Pilgrim Health Care, HealthCore, Inc., Humana, Inc., HealthPartners Institute, Kaiser Permanente Colorado, Kaiser Permanente Hawaii, Kaiser Permanente Northwest, and Vanderbilt University Medical Center/Tennessee Medicaid. We are indebted to the Tennessee Division of TennCare of the Department of Finance and Administration which provided data from the Tennessee Medicaid Program. The authors would also like to thank Aarthi Iyer and Susan Forrow for their help in the current project. Robert Ball and Taxiarchis Botsis are authors on US Patent 9,075,796, “Text mining for large medical text datasets and corresponding medical text classification using informative feature selection.” For access to PANACEA and other algorithms used in this project please contact the FDA Technology Transfer Program at techtransfer@fda.hhs.gov. ETHER is available at https://github.com/FDA/ETHER. Funding Information: This project and the original Mini-Sentinel anaphylaxis project were funded under the Mini-Sentinel task order HHSF22301012T from the US FDA. The authors would like to thank the data partners who contributed data to this project: Harvard Pilgrim Health Care, HealthCore, Inc., Humana, Inc., HealthPartners Institute, Kaiser Permanente Colorado, Kaiser Permanente Hawaii, Kaiser Permanente Northwest, and Vanderbilt University Medical Center/Tennessee Medicaid. We are indebted to the Tennessee Division of TennCare of the Department of Finance and Administration which provided data from the Tennessee Medicaid Program. The authors would also like to thank Aarthi Iyer and Susan Forrow for their help in the current project. Robert Ball and Taxiarchis Botsis are authors on US Patent 9,075,796, ?Text mining for large medical text datasets and corresponding medical text classification using informative feature selection.? For access to PANACEA and other algorithms used in this project please contact the FDA Technology Transfer Program at techtransfer@fda.hhs.gov. ETHER is available at https://github.com/FDA/ETHER. Publisher Copyright: {\textcopyright} 2018. This article is a U.S. Government work and is in the public domain in the USA.",

year = "2018",

month = oct,

doi = "10.1002/pds.4645",

language = "English (US)",

volume = "27",

pages = "1077--1084",

journal = "Pharmacoepidemiology and Drug Safety",

issn = "1053-8569",

publisher = "John Wiley and Sons Ltd",

number = "10",

}

TY - JOUR

T1 - Evaluating automated approaches to anaphylaxis case classification using unstructured data from the FDA Sentinel System

AU - Ball, Robert

AU - Toh, Sengwee

AU - Nolan, Jamie

AU - Haynes, Kevin

AU - Forshee, Richard

AU - Botsis, Taxiarchis

N1 - Funding Information: This project and the original Mini‐Sentinel anaphylaxis project were funded under the Mini‐Sentinel task order HHSF22301012T from the US FDA. The authors would like to thank the data partners who contributed data to this project: Harvard Pilgrim Health Care, HealthCore, Inc., Humana, Inc., HealthPartners Institute, Kaiser Permanente Colorado, Kaiser Permanente Hawaii, Kaiser Permanente Northwest, and Vanderbilt University Medical Center/Tennessee Medicaid. We are indebted to the Tennessee Division of TennCare of the Department of Finance and Administration which provided data from the Tennessee Medicaid Program. The authors would also like to thank Aarthi Iyer and Susan Forrow for their help in the current project. Robert Ball and Taxiarchis Botsis are authors on US Patent 9,075,796, “Text mining for large medical text datasets and corresponding medical text classification using informative feature selection.” For access to PANACEA and other algorithms used in this project please contact the FDA Technology Transfer Program at techtransfer@fda.hhs.gov. ETHER is available at https://github.com/FDA/ETHER. Funding Information: This project and the original Mini-Sentinel anaphylaxis project were funded under the Mini-Sentinel task order HHSF22301012T from the US FDA. The authors would like to thank the data partners who contributed data to this project: Harvard Pilgrim Health Care, HealthCore, Inc., Humana, Inc., HealthPartners Institute, Kaiser Permanente Colorado, Kaiser Permanente Hawaii, Kaiser Permanente Northwest, and Vanderbilt University Medical Center/Tennessee Medicaid. We are indebted to the Tennessee Division of TennCare of the Department of Finance and Administration which provided data from the Tennessee Medicaid Program. The authors would also like to thank Aarthi Iyer and Susan Forrow for their help in the current project. Robert Ball and Taxiarchis Botsis are authors on US Patent 9,075,796, ?Text mining for large medical text datasets and corresponding medical text classification using informative feature selection.? For access to PANACEA and other algorithms used in this project please contact the FDA Technology Transfer Program at techtransfer@fda.hhs.gov. ETHER is available at https://github.com/FDA/ETHER. Publisher Copyright: © 2018. This article is a U.S. Government work and is in the public domain in the USA.

PY - 2018/10

Y1 - 2018/10

N2 - Introduction: In May 2008, the Food and Drug Administration launched the Sentinel Initiative, a multi-year program for the establishment of a national electronic monitoring system for medical product safety that led, in 2016, to the launch of the full Sentinel System. Under the Mini-Sentinel pilot, several algorithms for identifying health outcomes of interest, including one for anaphylaxis, were developed and evaluated using data available from the Sentinel common data model. Purpose: To evaluate whether features extracted from unstructured narrative data using natural language processing (NLP) could be used to classify anaphylaxis cases. Methods: Using previously developed methods, we extracted features from unstructured narrative data using NLP and applied rule-based and similarity-based algorithms to identify anaphylaxis among 62 potential cases previously classified by human experts as anaphylaxis (N = 33), not anaphylaxis (N = 27), and unknown (N = 2). Results: The rule-based and similarity-based approaches demonstrated almost equal performance (recall 100% vs 100%, precision 60.3% vs 57.4%, F-measure: 0.753 vs 0.729). Reasons for misclassification included the inability of the algorithms to make the same clinical judgments as human experts about the timing, severity, or presence of alternative explanations; and the identification of terms consistent with anaphylaxis but present in conditions other than anaphylaxis. Conclusions: Although precision needs to be improved before these algorithms could be used without human review, we demonstrated that applying rule-based and similarity-based algorithms to unstructured narrative information from clinical records can be used for classification of anaphylaxis in the Sentinel System. Further development and assessment of these methods in the Sentinel System are warranted.

AB - Introduction: In May 2008, the Food and Drug Administration launched the Sentinel Initiative, a multi-year program for the establishment of a national electronic monitoring system for medical product safety that led, in 2016, to the launch of the full Sentinel System. Under the Mini-Sentinel pilot, several algorithms for identifying health outcomes of interest, including one for anaphylaxis, were developed and evaluated using data available from the Sentinel common data model. Purpose: To evaluate whether features extracted from unstructured narrative data using natural language processing (NLP) could be used to classify anaphylaxis cases. Methods: Using previously developed methods, we extracted features from unstructured narrative data using NLP and applied rule-based and similarity-based algorithms to identify anaphylaxis among 62 potential cases previously classified by human experts as anaphylaxis (N = 33), not anaphylaxis (N = 27), and unknown (N = 2). Results: The rule-based and similarity-based approaches demonstrated almost equal performance (recall 100% vs 100%, precision 60.3% vs 57.4%, F-measure: 0.753 vs 0.729). Reasons for misclassification included the inability of the algorithms to make the same clinical judgments as human experts about the timing, severity, or presence of alternative explanations; and the identification of terms consistent with anaphylaxis but present in conditions other than anaphylaxis. Conclusions: Although precision needs to be improved before these algorithms could be used without human review, we demonstrated that applying rule-based and similarity-based algorithms to unstructured narrative information from clinical records can be used for classification of anaphylaxis in the Sentinel System. Further development and assessment of these methods in the Sentinel System are warranted.

KW - anaphylaxis

KW - case classification

KW - natural language processing

KW - pharmacoepidemiology

KW - sentinel system

KW - validation

UR - http://www.scopus.com/inward/record.url?scp=85052817126&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052817126&partnerID=8YFLogxK

U2 - 10.1002/pds.4645

DO - 10.1002/pds.4645

M3 - Article

C2 - 30152575

AN - SCOPUS:85052817126

SN - 1053-8569

VL - 27

SP - 1077

EP - 1084

JO - Pharmacoepidemiology and Drug Safety

JF - Pharmacoepidemiology and Drug Safety

IS - 10

ER -

Evaluating automated approaches to anaphylaxis case classification using unstructured data from the FDA Sentinel System

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this