TY - JOUR
T1 - Generation of an annotated reference standard for vaccine adverse event reports
AU - Foster, Matthew
AU - Pandey, Abhishek
AU - Kreimeyer, Kory
AU - Botsis, Taxiarchis
N1 - Funding Information:
This work was supported by the Office of the Secretary Patient-Centered Outcomes Research Trust Fund under Interagency Agreement #750116PE060014 and in part by the appointment of Matthew Foster, Abhishek Pandey, and Kory Kreimeyer to the Research Participation Program administered by ORISE through an interagency agreement between the US Department of Energy and the US FDA. We would also like to thank Wei Wang from the Engility Corporation who was essential in developing the ETHER annotation capabilities.
Publisher Copyright:
© 2018 Elsevier Ltd
PY - 2018/7/5
Y1 - 2018/7/5
N2 - As part of a collaborative project between the US Food and Drug Administration (FDA) and the Centers for Disease Control and Prevention for the development of a web-based natural language processing (NLP) workbench, we created a corpus of 1000 Vaccine Adverse Event Reporting System (VAERS) reports annotated for 36,726 clinical features, 13,365 temporal features, and 22,395 clinical-temporal links. This paper describes the final corpus, as well as the methodology used to create it, so that clinical NLP researchers outside FDA can evaluate the utility of the corpus to aid their own work. The creation of this standard went through four phases: pre-training, pre-production, production-clinical feature annotation, and production-temporal annotation. The pre-production phase used a double annotation followed by adjudication strategy to refine and finalize the annotation model while the production phases followed a single annotation strategy to maximize the number of reports in the corpus. An analysis of 30 reports randomly selected as part of a quality control assessment yielded accuracies of 0.97, 0.96, and 0.83 for clinical features, temporal features, and clinical-temporal associations, respectively and speaks to the quality of the corpus.
AB - As part of a collaborative project between the US Food and Drug Administration (FDA) and the Centers for Disease Control and Prevention for the development of a web-based natural language processing (NLP) workbench, we created a corpus of 1000 Vaccine Adverse Event Reporting System (VAERS) reports annotated for 36,726 clinical features, 13,365 temporal features, and 22,395 clinical-temporal links. This paper describes the final corpus, as well as the methodology used to create it, so that clinical NLP researchers outside FDA can evaluate the utility of the corpus to aid their own work. The creation of this standard went through four phases: pre-training, pre-production, production-clinical feature annotation, and production-temporal annotation. The pre-production phase used a double annotation followed by adjudication strategy to refine and finalize the annotation model while the production phases followed a single annotation strategy to maximize the number of reports in the corpus. An analysis of 30 reports randomly selected as part of a quality control assessment yielded accuracies of 0.97, 0.96, and 0.83 for clinical features, temporal features, and clinical-temporal associations, respectively and speaks to the quality of the corpus.
KW - Annotation
KW - Corpus
KW - NLP
KW - Reference
KW - VAERS
UR - http://www.scopus.com/inward/record.url?scp=85048522236&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048522236&partnerID=8YFLogxK
U2 - 10.1016/j.vaccine.2018.05.079
DO - 10.1016/j.vaccine.2018.05.079
M3 - Article
C2 - 29880244
AN - SCOPUS:85048522236
SN - 0264-410X
VL - 36
SP - 4325
EP - 4330
JO - Vaccine
JF - Vaccine
IS - 29
ER -