Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): Architecture, component evaluation and applications

Guergana K. Savova, James J. Masanz, Philip V. Ogren, Jiaping Zheng, Sunghwan Sohn, Karin C. Kipper-Schuler, Christopher Chute

Research output: Contribution to journalArticle

Abstract

We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies - the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.

Original languageEnglish (US)
Pages (from-to)507-513
Number of pages7
JournalJournal of the American Medical Informatics Association
Volume17
Issue number5
DOIs
StatePublished - Sep 2010
Externally publishedYes

Fingerprint

Natural Language Processing
Semantics
Information Management
Electronic Health Records
Information Storage and Retrieval
Linguistics
Technology

ASJC Scopus subject areas

  • Health Informatics
  • Medicine(all)

Cite this

Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES) : Architecture, component evaluation and applications. / Savova, Guergana K.; Masanz, James J.; Ogren, Philip V.; Zheng, Jiaping; Sohn, Sunghwan; Kipper-Schuler, Karin C.; Chute, Christopher.

In: Journal of the American Medical Informatics Association, Vol. 17, No. 5, 09.2010, p. 507-513.

Research output: Contribution to journalArticle

Savova, Guergana K. ; Masanz, James J. ; Ogren, Philip V. ; Zheng, Jiaping ; Sohn, Sunghwan ; Kipper-Schuler, Karin C. ; Chute, Christopher. / Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES) : Architecture, component evaluation and applications. In: Journal of the American Medical Informatics Association. 2010 ; Vol. 17, No. 5. pp. 507-513.
@article{0f875f03dd89452daa14d04c2a1c9b9b,
title = "Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): Architecture, component evaluation and applications",
abstract = "We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies - the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.",
author = "Savova, {Guergana K.} and Masanz, {James J.} and Ogren, {Philip V.} and Jiaping Zheng and Sunghwan Sohn and Kipper-Schuler, {Karin C.} and Christopher Chute",
year = "2010",
month = "9",
doi = "10.1136/jamia.2009.001560",
language = "English (US)",
volume = "17",
pages = "507--513",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "5",

}

TY - JOUR

T1 - Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES)

T2 - Architecture, component evaluation and applications

AU - Savova, Guergana K.

AU - Masanz, James J.

AU - Ogren, Philip V.

AU - Zheng, Jiaping

AU - Sohn, Sunghwan

AU - Kipper-Schuler, Karin C.

AU - Chute, Christopher

PY - 2010/9

Y1 - 2010/9

N2 - We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies - the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.

AB - We aim to build and evaluate an open-source natural language processing system for information extraction from electronic medical record clinical free-text. We describe and evaluate our system, the clinical Text Analysis and Knowledge Extraction System (cTAKES), released open-source at http://www.ohnlp.org. The cTAKES builds on existing open-source technologies - the Unstructured Information Management Architecture framework and OpenNLP natural language processing toolkit. Its components, specifically trained for the clinical domain, create rich linguistic and semantic annotations. Performance of individual components: sentence boundary detector accuracy=0.949; tokenizer accuracy=0.949; part-of-speech tagger accuracy=0.936; shallow parser F-score=0.924; named entity recognizer and system-level evaluation F-score=0.715 for exact and 0.824 for overlapping spans, and accuracy for concept mapping, negation, and status attributes for exact and overlapping spans of 0.957, 0.943, 0.859, and 0.580, 0.939, and 0.839, respectively. Overall performance is discussed against five applications. The cTAKES annotations are the foundation for methods and modules for higher-level semantic processing of clinical free-text.

UR - http://www.scopus.com/inward/record.url?scp=78149490620&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78149490620&partnerID=8YFLogxK

U2 - 10.1136/jamia.2009.001560

DO - 10.1136/jamia.2009.001560

M3 - Article

C2 - 20819853

AN - SCOPUS:78149490620

VL - 17

SP - 507

EP - 513

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 5

ER -