Generation of an annotated reference standard for vaccine adverse event reports

Matthew Foster, Abhishek Pandey, Kory Kreimeyer, Taxiarchis Botsis

Research output: Contribution to journalArticle

Abstract

As part of a collaborative project between the US Food and Drug Administration (FDA) and the Centers for Disease Control and Prevention for the development of a web-based natural language processing (NLP) workbench, we created a corpus of 1000 Vaccine Adverse Event Reporting System (VAERS) reports annotated for 36,726 clinical features, 13,365 temporal features, and 22,395 clinical-temporal links. This paper describes the final corpus, as well as the methodology used to create it, so that clinical NLP researchers outside FDA can evaluate the utility of the corpus to aid their own work. The creation of this standard went through four phases: pre-training, pre-production, production-clinical feature annotation, and production-temporal annotation. The pre-production phase used a double annotation followed by adjudication strategy to refine and finalize the annotation model while the production phases followed a single annotation strategy to maximize the number of reports in the corpus. An analysis of 30 reports randomly selected as part of a quality control assessment yielded accuracies of 0.97, 0.96, and 0.83 for clinical features, temporal features, and clinical-temporal associations, respectively and speaks to the quality of the corpus.

Original languageEnglish (US)
Pages (from-to)4325-4330
Number of pages6
JournalVaccine
Volume36
Issue number29
DOIs
StatePublished - Jul 5 2018

Fingerprint

reference standards
Vaccines
vaccines
Centers for Disease Control and Prevention
Natural Language Processing
quality control
United States Food and Drug Administration
researchers
Centers for Disease Control and Prevention (U.S.)
Quality Control
Research Personnel
methodology

Keywords

  • Annotation
  • Corpus
  • NLP
  • Reference
  • VAERS

ASJC Scopus subject areas

  • Molecular Medicine
  • Immunology and Microbiology(all)
  • veterinary(all)
  • Public Health, Environmental and Occupational Health
  • Infectious Diseases

Cite this

Generation of an annotated reference standard for vaccine adverse event reports. / Foster, Matthew; Pandey, Abhishek; Kreimeyer, Kory; Botsis, Taxiarchis.

In: Vaccine, Vol. 36, No. 29, 05.07.2018, p. 4325-4330.

Research output: Contribution to journalArticle

Foster, Matthew ; Pandey, Abhishek ; Kreimeyer, Kory ; Botsis, Taxiarchis. / Generation of an annotated reference standard for vaccine adverse event reports. In: Vaccine. 2018 ; Vol. 36, No. 29. pp. 4325-4330.
@article{0724c485c4e347e88f68ae9067f51073,
title = "Generation of an annotated reference standard for vaccine adverse event reports",
abstract = "As part of a collaborative project between the US Food and Drug Administration (FDA) and the Centers for Disease Control and Prevention for the development of a web-based natural language processing (NLP) workbench, we created a corpus of 1000 Vaccine Adverse Event Reporting System (VAERS) reports annotated for 36,726 clinical features, 13,365 temporal features, and 22,395 clinical-temporal links. This paper describes the final corpus, as well as the methodology used to create it, so that clinical NLP researchers outside FDA can evaluate the utility of the corpus to aid their own work. The creation of this standard went through four phases: pre-training, pre-production, production-clinical feature annotation, and production-temporal annotation. The pre-production phase used a double annotation followed by adjudication strategy to refine and finalize the annotation model while the production phases followed a single annotation strategy to maximize the number of reports in the corpus. An analysis of 30 reports randomly selected as part of a quality control assessment yielded accuracies of 0.97, 0.96, and 0.83 for clinical features, temporal features, and clinical-temporal associations, respectively and speaks to the quality of the corpus.",
keywords = "Annotation, Corpus, NLP, Reference, VAERS",
author = "Matthew Foster and Abhishek Pandey and Kory Kreimeyer and Taxiarchis Botsis",
year = "2018",
month = "7",
day = "5",
doi = "10.1016/j.vaccine.2018.05.079",
language = "English (US)",
volume = "36",
pages = "4325--4330",
journal = "Vaccine",
issn = "0264-410X",
publisher = "Elsevier BV",
number = "29",

}

TY - JOUR

T1 - Generation of an annotated reference standard for vaccine adverse event reports

AU - Foster, Matthew

AU - Pandey, Abhishek

AU - Kreimeyer, Kory

AU - Botsis, Taxiarchis

PY - 2018/7/5

Y1 - 2018/7/5

N2 - As part of a collaborative project between the US Food and Drug Administration (FDA) and the Centers for Disease Control and Prevention for the development of a web-based natural language processing (NLP) workbench, we created a corpus of 1000 Vaccine Adverse Event Reporting System (VAERS) reports annotated for 36,726 clinical features, 13,365 temporal features, and 22,395 clinical-temporal links. This paper describes the final corpus, as well as the methodology used to create it, so that clinical NLP researchers outside FDA can evaluate the utility of the corpus to aid their own work. The creation of this standard went through four phases: pre-training, pre-production, production-clinical feature annotation, and production-temporal annotation. The pre-production phase used a double annotation followed by adjudication strategy to refine and finalize the annotation model while the production phases followed a single annotation strategy to maximize the number of reports in the corpus. An analysis of 30 reports randomly selected as part of a quality control assessment yielded accuracies of 0.97, 0.96, and 0.83 for clinical features, temporal features, and clinical-temporal associations, respectively and speaks to the quality of the corpus.

AB - As part of a collaborative project between the US Food and Drug Administration (FDA) and the Centers for Disease Control and Prevention for the development of a web-based natural language processing (NLP) workbench, we created a corpus of 1000 Vaccine Adverse Event Reporting System (VAERS) reports annotated for 36,726 clinical features, 13,365 temporal features, and 22,395 clinical-temporal links. This paper describes the final corpus, as well as the methodology used to create it, so that clinical NLP researchers outside FDA can evaluate the utility of the corpus to aid their own work. The creation of this standard went through four phases: pre-training, pre-production, production-clinical feature annotation, and production-temporal annotation. The pre-production phase used a double annotation followed by adjudication strategy to refine and finalize the annotation model while the production phases followed a single annotation strategy to maximize the number of reports in the corpus. An analysis of 30 reports randomly selected as part of a quality control assessment yielded accuracies of 0.97, 0.96, and 0.83 for clinical features, temporal features, and clinical-temporal associations, respectively and speaks to the quality of the corpus.

KW - Annotation

KW - Corpus

KW - NLP

KW - Reference

KW - VAERS

UR - http://www.scopus.com/inward/record.url?scp=85048522236&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048522236&partnerID=8YFLogxK

U2 - 10.1016/j.vaccine.2018.05.079

DO - 10.1016/j.vaccine.2018.05.079

M3 - Article

C2 - 29880244

AN - SCOPUS:85048522236

VL - 36

SP - 4325

EP - 4330

JO - Vaccine

JF - Vaccine

SN - 0264-410X

IS - 29

ER -