A systematic approach for developing a corpus of patient reported adverse drug events: A case study for SSRI and SNRI medications

Maryam Zolnoori, Kin Wah Fung, Timothy B. Patrick, Paul Fontelo, Hadi H K Kharrazi, Anthony Faiola, Yi Shuan Shirley Wu, Christina E. Eldredge, Jake Luo, Mike Conway, Jiaxi Zhu, Soo Kyung Park, Kelly Xu, Hamideh Moayyed, Somaieh Goudarzvand

Research output: Contribution to journalArticle

Abstract

“Psychiatric Treatment Adverse Reactions” (PsyTAR) corpus is an annotated corpus that has been developed using patients narrative data for psychiatric medications, particularly SSRIs (Selective Serotonin Reuptake Inhibitor) and SNRIs (Serotonin Norepinephrine Reuptake Inhibitor) medications. This corpus consists of three main components: sentence classification, entity identification, and entity normalization. We split the review posts into sentences and labeled them for presence of adverse drug reactions (ADRs) (2168 sentences), withdrawal symptoms (WDs) (438 sentences), sign/symptoms/illness (SSIs) (789 sentences), drug indications (517), drug effectiveness (EF) (1087 sentences), and drug infectiveness (INF) (337 sentences). In the entity identification phase, we identified and extracted ADRs (4813 mentions), WDs (590 mentions), SSIs (1219 mentions), and DIs (792). In the entity normalization phase, we mapped the identified entities to the corresponding concepts in both UMLS (918 unique concepts) and SNOMED CT (755 unique concepts). Four annotators double coded the sentences and the span of identified entities by strictly following guidelines rules developed for this study. We used the PsyTAR sentence classification component to automatically train a range of supervised machine learning classifiers to identifying text segments with the mentions of ADRs, WDs, DIs, SSIs, EF, and INF. SVMs classifiers had the highest performance with F-Score 0.90. We also measured performance of the cTAKES (clinical Text Analysis and Knowledge Extraction System) in identifying patients’ expressions of ADRs and WDs with and without adding PsyTAR dictionary to the core dictionary of cTAKES. Augmenting cTAKES dictionary with PsyTAR improved the F-score cTAKES by 25%. The findings imply that PsyTAR has significant implications for text mining algorithms aimed to identify information about adverse drug events and drug effectiveness from patients’ narratives data, by linking the patients’ expressions of adverse drug events to medical standard vocabularies. The corpus is publicly available at Zolnoori et al. [30].

Original languageEnglish (US)
Article number103091
JournalJournal of Biomedical Informatics
Volume90
DOIs
StatePublished - Feb 1 2019

Fingerprint

Norepinephrine
Serotonin Uptake Inhibitors
Drug-Related Side Effects and Adverse Reactions
Psychiatry
Glossaries
Signs and Symptoms
Pharmaceutical Preparations
Classifiers
Systematized Nomenclature of Medicine
Unified Medical Language System
Therapeutics
Substance Withdrawal Syndrome
Data Mining
Vocabulary
Learning systems
Serotonin and Noradrenaline Reuptake Inhibitors
Guidelines

Keywords

  • Adverse drug events
  • Annotated corpus
  • Drug effectiveness
  • Drug safety
  • Information extraction
  • Machine learning
  • Online healthcare forums
  • Patients narratives
  • Psychiatric medications
  • Semantic mapping
  • SNOMED CT
  • SNRIs
  • Social media
  • SSRIs
  • Text mining
  • UMLS

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this

A systematic approach for developing a corpus of patient reported adverse drug events : A case study for SSRI and SNRI medications. / Zolnoori, Maryam; Fung, Kin Wah; Patrick, Timothy B.; Fontelo, Paul; Kharrazi, Hadi H K; Faiola, Anthony; Wu, Yi Shuan Shirley; Eldredge, Christina E.; Luo, Jake; Conway, Mike; Zhu, Jiaxi; Park, Soo Kyung; Xu, Kelly; Moayyed, Hamideh; Goudarzvand, Somaieh.

In: Journal of Biomedical Informatics, Vol. 90, 103091, 01.02.2019.

Research output: Contribution to journalArticle

Zolnoori, M, Fung, KW, Patrick, TB, Fontelo, P, Kharrazi, HHK, Faiola, A, Wu, YSS, Eldredge, CE, Luo, J, Conway, M, Zhu, J, Park, SK, Xu, K, Moayyed, H & Goudarzvand, S 2019, 'A systematic approach for developing a corpus of patient reported adverse drug events: A case study for SSRI and SNRI medications', Journal of Biomedical Informatics, vol. 90, 103091. https://doi.org/10.1016/j.jbi.2018.12.005
Zolnoori, Maryam ; Fung, Kin Wah ; Patrick, Timothy B. ; Fontelo, Paul ; Kharrazi, Hadi H K ; Faiola, Anthony ; Wu, Yi Shuan Shirley ; Eldredge, Christina E. ; Luo, Jake ; Conway, Mike ; Zhu, Jiaxi ; Park, Soo Kyung ; Xu, Kelly ; Moayyed, Hamideh ; Goudarzvand, Somaieh. / A systematic approach for developing a corpus of patient reported adverse drug events : A case study for SSRI and SNRI medications. In: Journal of Biomedical Informatics. 2019 ; Vol. 90.
@article{0315e605a88b4815be8ca77daf20d1c8,
title = "A systematic approach for developing a corpus of patient reported adverse drug events: A case study for SSRI and SNRI medications",
abstract = "“Psychiatric Treatment Adverse Reactions” (PsyTAR) corpus is an annotated corpus that has been developed using patients narrative data for psychiatric medications, particularly SSRIs (Selective Serotonin Reuptake Inhibitor) and SNRIs (Serotonin Norepinephrine Reuptake Inhibitor) medications. This corpus consists of three main components: sentence classification, entity identification, and entity normalization. We split the review posts into sentences and labeled them for presence of adverse drug reactions (ADRs) (2168 sentences), withdrawal symptoms (WDs) (438 sentences), sign/symptoms/illness (SSIs) (789 sentences), drug indications (517), drug effectiveness (EF) (1087 sentences), and drug infectiveness (INF) (337 sentences). In the entity identification phase, we identified and extracted ADRs (4813 mentions), WDs (590 mentions), SSIs (1219 mentions), and DIs (792). In the entity normalization phase, we mapped the identified entities to the corresponding concepts in both UMLS (918 unique concepts) and SNOMED CT (755 unique concepts). Four annotators double coded the sentences and the span of identified entities by strictly following guidelines rules developed for this study. We used the PsyTAR sentence classification component to automatically train a range of supervised machine learning classifiers to identifying text segments with the mentions of ADRs, WDs, DIs, SSIs, EF, and INF. SVMs classifiers had the highest performance with F-Score 0.90. We also measured performance of the cTAKES (clinical Text Analysis and Knowledge Extraction System) in identifying patients’ expressions of ADRs and WDs with and without adding PsyTAR dictionary to the core dictionary of cTAKES. Augmenting cTAKES dictionary with PsyTAR improved the F-score cTAKES by 25{\%}. The findings imply that PsyTAR has significant implications for text mining algorithms aimed to identify information about adverse drug events and drug effectiveness from patients’ narratives data, by linking the patients’ expressions of adverse drug events to medical standard vocabularies. The corpus is publicly available at Zolnoori et al. [30].",
keywords = "Adverse drug events, Annotated corpus, Drug effectiveness, Drug safety, Information extraction, Machine learning, Online healthcare forums, Patients narratives, Psychiatric medications, Semantic mapping, SNOMED CT, SNRIs, Social media, SSRIs, Text mining, UMLS",
author = "Maryam Zolnoori and Fung, {Kin Wah} and Patrick, {Timothy B.} and Paul Fontelo and Kharrazi, {Hadi H K} and Anthony Faiola and Wu, {Yi Shuan Shirley} and Eldredge, {Christina E.} and Jake Luo and Mike Conway and Jiaxi Zhu and Park, {Soo Kyung} and Kelly Xu and Hamideh Moayyed and Somaieh Goudarzvand",
year = "2019",
month = "2",
day = "1",
doi = "10.1016/j.jbi.2018.12.005",
language = "English (US)",
volume = "90",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - A systematic approach for developing a corpus of patient reported adverse drug events

T2 - A case study for SSRI and SNRI medications

AU - Zolnoori, Maryam

AU - Fung, Kin Wah

AU - Patrick, Timothy B.

AU - Fontelo, Paul

AU - Kharrazi, Hadi H K

AU - Faiola, Anthony

AU - Wu, Yi Shuan Shirley

AU - Eldredge, Christina E.

AU - Luo, Jake

AU - Conway, Mike

AU - Zhu, Jiaxi

AU - Park, Soo Kyung

AU - Xu, Kelly

AU - Moayyed, Hamideh

AU - Goudarzvand, Somaieh

PY - 2019/2/1

Y1 - 2019/2/1

N2 - “Psychiatric Treatment Adverse Reactions” (PsyTAR) corpus is an annotated corpus that has been developed using patients narrative data for psychiatric medications, particularly SSRIs (Selective Serotonin Reuptake Inhibitor) and SNRIs (Serotonin Norepinephrine Reuptake Inhibitor) medications. This corpus consists of three main components: sentence classification, entity identification, and entity normalization. We split the review posts into sentences and labeled them for presence of adverse drug reactions (ADRs) (2168 sentences), withdrawal symptoms (WDs) (438 sentences), sign/symptoms/illness (SSIs) (789 sentences), drug indications (517), drug effectiveness (EF) (1087 sentences), and drug infectiveness (INF) (337 sentences). In the entity identification phase, we identified and extracted ADRs (4813 mentions), WDs (590 mentions), SSIs (1219 mentions), and DIs (792). In the entity normalization phase, we mapped the identified entities to the corresponding concepts in both UMLS (918 unique concepts) and SNOMED CT (755 unique concepts). Four annotators double coded the sentences and the span of identified entities by strictly following guidelines rules developed for this study. We used the PsyTAR sentence classification component to automatically train a range of supervised machine learning classifiers to identifying text segments with the mentions of ADRs, WDs, DIs, SSIs, EF, and INF. SVMs classifiers had the highest performance with F-Score 0.90. We also measured performance of the cTAKES (clinical Text Analysis and Knowledge Extraction System) in identifying patients’ expressions of ADRs and WDs with and without adding PsyTAR dictionary to the core dictionary of cTAKES. Augmenting cTAKES dictionary with PsyTAR improved the F-score cTAKES by 25%. The findings imply that PsyTAR has significant implications for text mining algorithms aimed to identify information about adverse drug events and drug effectiveness from patients’ narratives data, by linking the patients’ expressions of adverse drug events to medical standard vocabularies. The corpus is publicly available at Zolnoori et al. [30].

AB - “Psychiatric Treatment Adverse Reactions” (PsyTAR) corpus is an annotated corpus that has been developed using patients narrative data for psychiatric medications, particularly SSRIs (Selective Serotonin Reuptake Inhibitor) and SNRIs (Serotonin Norepinephrine Reuptake Inhibitor) medications. This corpus consists of three main components: sentence classification, entity identification, and entity normalization. We split the review posts into sentences and labeled them for presence of adverse drug reactions (ADRs) (2168 sentences), withdrawal symptoms (WDs) (438 sentences), sign/symptoms/illness (SSIs) (789 sentences), drug indications (517), drug effectiveness (EF) (1087 sentences), and drug infectiveness (INF) (337 sentences). In the entity identification phase, we identified and extracted ADRs (4813 mentions), WDs (590 mentions), SSIs (1219 mentions), and DIs (792). In the entity normalization phase, we mapped the identified entities to the corresponding concepts in both UMLS (918 unique concepts) and SNOMED CT (755 unique concepts). Four annotators double coded the sentences and the span of identified entities by strictly following guidelines rules developed for this study. We used the PsyTAR sentence classification component to automatically train a range of supervised machine learning classifiers to identifying text segments with the mentions of ADRs, WDs, DIs, SSIs, EF, and INF. SVMs classifiers had the highest performance with F-Score 0.90. We also measured performance of the cTAKES (clinical Text Analysis and Knowledge Extraction System) in identifying patients’ expressions of ADRs and WDs with and without adding PsyTAR dictionary to the core dictionary of cTAKES. Augmenting cTAKES dictionary with PsyTAR improved the F-score cTAKES by 25%. The findings imply that PsyTAR has significant implications for text mining algorithms aimed to identify information about adverse drug events and drug effectiveness from patients’ narratives data, by linking the patients’ expressions of adverse drug events to medical standard vocabularies. The corpus is publicly available at Zolnoori et al. [30].

KW - Adverse drug events

KW - Annotated corpus

KW - Drug effectiveness

KW - Drug safety

KW - Information extraction

KW - Machine learning

KW - Online healthcare forums

KW - Patients narratives

KW - Psychiatric medications

KW - Semantic mapping

KW - SNOMED CT

KW - SNRIs

KW - Social media

KW - SSRIs

KW - Text mining

KW - UMLS

UR - http://www.scopus.com/inward/record.url?scp=85060100705&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060100705&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2018.12.005

DO - 10.1016/j.jbi.2018.12.005

M3 - Article

C2 - 30611893

AN - SCOPUS:85060100705

VL - 90

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

M1 - 103091

ER -