Automating case definitions using literature-based reasoning

Research output: Contribution to journalArticle

Abstract

Background: Establishing a Case Definition (CDef) is a first step in many epidemiological, clinical, surveillance, and research activities. The application of CDefs still relies on manual steps and this is a major source of inefficiency in surveillance and research. Objective: Describe the need and propose an approach for automating the useful representation of CDefs for medical conditions. Methods: We translated the existing Brighton Collaboration CDef for anaphylaxis by mostly relying on the identification of synonyms for the criteria of the CDef using the NLM MetaMap tool. We also generated a CDef for the same condition using all the related PubMed abstracts, processing them with a text mining tool, and further treating the synonyms with the above strategy. The co-occurrence of the anaphylaxis and any other medical term within the same sentence of the abstracts supported the construction of a large semantic network. The 'islands' algorithm reduced the network and revealed its densest region including the nodes that were used to represent the key criteria of the CDef. We evaluated the ability of the "translated" and the "generated" CDef to classify a set of 6034 H1N1 reports for anaphylaxis using two similarity approaches and comparing them with our previous semi-automated classification approach. Results: Overall classification performance across approaches to producing CDefs was similar, with the generated CDef and vector space model with cosine similarity having the highest accuracy (0.825±0.003) and the semi-automated approach and vector space model with cosine similarity having the highest recall (0.809±0.042). Precision was low for all approaches. Conclusion: The useful representation of CDefs is a complicated task but potentially offers substantial gains in efficiency to support safety and clinical surveillance.

Original languageEnglish (US)
Pages (from-to)515-527
Number of pages13
JournalApplied Clinical Informatics
Volume4
Issue number4
DOIs
StatePublished - Dec 1 2013
Externally publishedYes

Fingerprint

Anaphylaxis
Vector spaces
Space Simulation
Data Mining
Semantics
Research
Islands
PubMed
Processing
Safety

Keywords

  • Anaphylaxis
  • Case definition
  • Literature-based reasoning
  • Safety surveillance
  • Semantic networks
  • Similarity

ASJC Scopus subject areas

  • Health Informatics
  • Computer Science Applications
  • Health Information Management

Cite this

Automating case definitions using literature-based reasoning. / Botsis, Taxiarchis; Ball, R.

In: Applied Clinical Informatics, Vol. 4, No. 4, 01.12.2013, p. 515-527.

Research output: Contribution to journalArticle

@article{46016fd9dfbe46799c26929f76535d2e,
title = "Automating case definitions using literature-based reasoning",
abstract = "Background: Establishing a Case Definition (CDef) is a first step in many epidemiological, clinical, surveillance, and research activities. The application of CDefs still relies on manual steps and this is a major source of inefficiency in surveillance and research. Objective: Describe the need and propose an approach for automating the useful representation of CDefs for medical conditions. Methods: We translated the existing Brighton Collaboration CDef for anaphylaxis by mostly relying on the identification of synonyms for the criteria of the CDef using the NLM MetaMap tool. We also generated a CDef for the same condition using all the related PubMed abstracts, processing them with a text mining tool, and further treating the synonyms with the above strategy. The co-occurrence of the anaphylaxis and any other medical term within the same sentence of the abstracts supported the construction of a large semantic network. The 'islands' algorithm reduced the network and revealed its densest region including the nodes that were used to represent the key criteria of the CDef. We evaluated the ability of the {"}translated{"} and the {"}generated{"} CDef to classify a set of 6034 H1N1 reports for anaphylaxis using two similarity approaches and comparing them with our previous semi-automated classification approach. Results: Overall classification performance across approaches to producing CDefs was similar, with the generated CDef and vector space model with cosine similarity having the highest accuracy (0.825±0.003) and the semi-automated approach and vector space model with cosine similarity having the highest recall (0.809±0.042). Precision was low for all approaches. Conclusion: The useful representation of CDefs is a complicated task but potentially offers substantial gains in efficiency to support safety and clinical surveillance.",
keywords = "Anaphylaxis, Case definition, Literature-based reasoning, Safety surveillance, Semantic networks, Similarity",
author = "Taxiarchis Botsis and R. Ball",
year = "2013",
month = "12",
day = "1",
doi = "10.4338/ACI-2013-04-RA-0028",
language = "English (US)",
volume = "4",
pages = "515--527",
journal = "Applied Clinical Informatics",
issn = "1869-0327",
publisher = "Schattauer GmbH",
number = "4",

}

TY - JOUR

T1 - Automating case definitions using literature-based reasoning

AU - Botsis, Taxiarchis

AU - Ball, R.

PY - 2013/12/1

Y1 - 2013/12/1

N2 - Background: Establishing a Case Definition (CDef) is a first step in many epidemiological, clinical, surveillance, and research activities. The application of CDefs still relies on manual steps and this is a major source of inefficiency in surveillance and research. Objective: Describe the need and propose an approach for automating the useful representation of CDefs for medical conditions. Methods: We translated the existing Brighton Collaboration CDef for anaphylaxis by mostly relying on the identification of synonyms for the criteria of the CDef using the NLM MetaMap tool. We also generated a CDef for the same condition using all the related PubMed abstracts, processing them with a text mining tool, and further treating the synonyms with the above strategy. The co-occurrence of the anaphylaxis and any other medical term within the same sentence of the abstracts supported the construction of a large semantic network. The 'islands' algorithm reduced the network and revealed its densest region including the nodes that were used to represent the key criteria of the CDef. We evaluated the ability of the "translated" and the "generated" CDef to classify a set of 6034 H1N1 reports for anaphylaxis using two similarity approaches and comparing them with our previous semi-automated classification approach. Results: Overall classification performance across approaches to producing CDefs was similar, with the generated CDef and vector space model with cosine similarity having the highest accuracy (0.825±0.003) and the semi-automated approach and vector space model with cosine similarity having the highest recall (0.809±0.042). Precision was low for all approaches. Conclusion: The useful representation of CDefs is a complicated task but potentially offers substantial gains in efficiency to support safety and clinical surveillance.

AB - Background: Establishing a Case Definition (CDef) is a first step in many epidemiological, clinical, surveillance, and research activities. The application of CDefs still relies on manual steps and this is a major source of inefficiency in surveillance and research. Objective: Describe the need and propose an approach for automating the useful representation of CDefs for medical conditions. Methods: We translated the existing Brighton Collaboration CDef for anaphylaxis by mostly relying on the identification of synonyms for the criteria of the CDef using the NLM MetaMap tool. We also generated a CDef for the same condition using all the related PubMed abstracts, processing them with a text mining tool, and further treating the synonyms with the above strategy. The co-occurrence of the anaphylaxis and any other medical term within the same sentence of the abstracts supported the construction of a large semantic network. The 'islands' algorithm reduced the network and revealed its densest region including the nodes that were used to represent the key criteria of the CDef. We evaluated the ability of the "translated" and the "generated" CDef to classify a set of 6034 H1N1 reports for anaphylaxis using two similarity approaches and comparing them with our previous semi-automated classification approach. Results: Overall classification performance across approaches to producing CDefs was similar, with the generated CDef and vector space model with cosine similarity having the highest accuracy (0.825±0.003) and the semi-automated approach and vector space model with cosine similarity having the highest recall (0.809±0.042). Precision was low for all approaches. Conclusion: The useful representation of CDefs is a complicated task but potentially offers substantial gains in efficiency to support safety and clinical surveillance.

KW - Anaphylaxis

KW - Case definition

KW - Literature-based reasoning

KW - Safety surveillance

KW - Semantic networks

KW - Similarity

UR - http://www.scopus.com/inward/record.url?scp=84893067989&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84893067989&partnerID=8YFLogxK

U2 - 10.4338/ACI-2013-04-RA-0028

DO - 10.4338/ACI-2013-04-RA-0028

M3 - Article

C2 - 24454579

AN - SCOPUS:84893067989

VL - 4

SP - 515

EP - 527

JO - Applied Clinical Informatics

JF - Applied Clinical Informatics

SN - 1869-0327

IS - 4

ER -