Text mining for the vaccine adverse event reporting system: Medical text classification using informative feature selection

Taxiarchis Botsis; Michael D. Nguyen; Emily Jane Woo; Marianthi Markatou; Robert Ball

doi:10.1136/amiajnl-2010-000022

Text mining for the vaccine adverse event reporting system: Medical text classification using informative feature selection

Taxiarchis Botsis, Michael D. Nguyen, Emily Jane Woo, Marianthi Markatou, Robert Ball

Research output: Contribution to journal › Article › peer-review

60 Scopus citations

Abstract

Objective: The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload. Design: We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (N _pos=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations. Measurements: Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed. Results: Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively). Conclusion: Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably.

Original language	English (US)
Pages (from-to)	631-638
Number of pages	8
Journal	Journal of the American Medical Informatics Association
Volume	18
Issue number	5
DOIs	https://doi.org/10.1136/amiajnl-2010-000022
State	Published - Sep 2011
Externally published	Yes

ASJC Scopus subject areas

Health Informatics

Access to Document

10.1136/amiajnl-2010-000022

Cite this

@article{94d8097fab27498794f1ef449e25684f,

title = "Text mining for the vaccine adverse event reporting system: Medical text classification using informative feature selection",

abstract = "Objective: The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload. Design: We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (N pos=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations. Measurements: Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed. Results: Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively). Conclusion: Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably.",

author = "Taxiarchis Botsis and Nguyen, {Michael D.} and Woo, {Emily Jane} and Marianthi Markatou and Robert Ball",

year = "2011",

month = sep,

doi = "10.1136/amiajnl-2010-000022",

language = "English (US)",

volume = "18",

pages = "631--638",

journal = "Journal of the American Medical Informatics Association",

issn = "1067-5027",

publisher = "Oxford University Press",

number = "5",

}

TY - JOUR

T1 - Text mining for the vaccine adverse event reporting system

T2 - Medical text classification using informative feature selection

AU - Botsis, Taxiarchis

AU - Nguyen, Michael D.

AU - Woo, Emily Jane

AU - Markatou, Marianthi

AU - Ball, Robert

PY - 2011/9

Y1 - 2011/9

N2 - Objective: The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload. Design: We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (N pos=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations. Measurements: Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed. Results: Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively). Conclusion: Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably.

AB - Objective: The US Vaccine Adverse Event Reporting System (VAERS) collects spontaneous reports of adverse events following vaccination. Medical officers review the reports and often apply standardized case definitions, such as those developed by the Brighton Collaboration. Our objective was to demonstrate a multi-level text mining approach for automated text classification of VAERS reports that could potentially reduce human workload. Design: We selected 6034 VAERS reports for H1N1 vaccine that were classified by medical officers as potentially positive (N pos=237) or negative for anaphylaxis. We created a categorized corpus of text files that included the class label and the symptom text field of each report. A validation set of 1100 labeled text files was also used. Text mining techniques were applied to extract three feature sets for important keywords, low- and high-level patterns. A rule-based classifier processed the high-level feature representation, while several machine learning classifiers were trained for the remaining two feature representations. Measurements: Classifiers' performance was evaluated by macro-averaging recall, precision, and F-measure, and Friedman's test; misclassification error rate analysis was also performed. Results: Rule-based classifier, boosted trees, and weighted support vector machines performed well in terms of macro-recall, however at the expense of a higher mean misclassification error rate. The rule-based classifier performed very well in terms of average sensitivity and specificity (79.05% and 94.80%, respectively). Conclusion: Our validated results showed the possibility of developing effective medical text classifiers for VAERS reports by combining text mining with informative feature selection; this strategy has the potential to reduce reviewer workload considerably.

UR - http://www.scopus.com/inward/record.url?scp=80053260063&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053260063&partnerID=8YFLogxK

U2 - 10.1136/amiajnl-2010-000022

DO - 10.1136/amiajnl-2010-000022

M3 - Article

C2 - 21709163

AN - SCOPUS:80053260063

SN - 1067-5027

VL - 18

SP - 631

EP - 638

JO - Journal of the American Medical Informatics Association

JF - Journal of the American Medical Informatics Association

IS - 5

ER -

Text mining for the vaccine adverse event reporting system: Medical text classification using informative feature selection

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this