Extraction of geriatric syndromes from electronic health record clinical notes: Assessment of statistical natural language processing methods

Tao Chen; Mark Dredze; Jonathan P. Weiner; Leilani Hernandez; Joe Kimura; Hadi Kharrazi

doi:10.2196/13039

Extraction of geriatric syndromes from electronic health record clinical notes: Assessment of statistical natural language processing methods

Tao Chen, Mark Dredze, Jonathan P. Weiner, Leilani Hernandez, Joe Kimura, Hadi Kharrazi

Bloomberg School of Public Health

Research output: Contribution to journal › Article › peer-review

Abstract

Background: Geriatric syndromes in older adults are associated with adverse outcomes. However, despite being reported in clinical notes, these syndromes are often poorly captured by diagnostic codes in the structured fields of electronic health records (EHRs) or administrative records. Objective: We aim to automatically determine if a patient has any geriatric syndromes by mining the free text of associated EHR clinical notes. We assessed which statistical natural language processing (NLP) techniques are most effective. Methods: We applied conditional random fields (CRFs), a widely used machine learning algorithm, to identify each of 10 geriatric syndrome constructs in a clinical note. We assessed three sets of features and attributes for CRF operations: a base set, enhanced token, and contextual features. We trained the CRF on 3901 manually annotated notes from 85 patients, tuned the CRF on a validation set of 50 patients, and evaluated it on 50 held-out test patients. These notes were from a group of US Medicare patients over 65 years of age enrolled in a Medicare Advantage Health Maintenance Organization and cared for by a large group practice in Massachusetts. Results: A final feature set was formed through comprehensive feature ablation experiments. The final CRF model performed well at patient-level determination (macroaverage F1=0.834, microaverage F1=0.851); however, performance varied by construct. For example, at phrase-partial evaluation, the CRF model worked well on constructs such as absence of fecal control (F1=0.857) and vision impairment (F1=0.798) but poorly on malnutrition (F1=0.155), weight loss (F1=0.394), and severe urinary control issues (F1=0.532). Errors were primarily due to previously unobserved words (ie, out-of-vocabulary) and a lack of context. Conclusions: This study shows that statistical NLP can be used to identify geriatric syndromes from EHR-extracted clinical notes. This creates new opportunities to identify patients with geriatric syndromes and study their health outcomes.

Original language	English (US)
Article number	e13039
Journal	JMIR Medical Informatics
Volume	7
Issue number	1
DOIs	https://doi.org/10.2196/13039
State	Published - Jan 1 2019

Keywords

Clinical notes
Conditional random fields
Geriatrics
Information extraction
Natural language processing

ASJC Scopus subject areas

Health Informatics
Health Information Management

Access to Document

10.2196/13039

Cite this

@article{1dfe0e743a8842a2a5f05d0afc0f84df,

title = "Extraction of geriatric syndromes from electronic health record clinical notes: Assessment of statistical natural language processing methods",

abstract = "Background: Geriatric syndromes in older adults are associated with adverse outcomes. However, despite being reported in clinical notes, these syndromes are often poorly captured by diagnostic codes in the structured fields of electronic health records (EHRs) or administrative records. Objective: We aim to automatically determine if a patient has any geriatric syndromes by mining the free text of associated EHR clinical notes. We assessed which statistical natural language processing (NLP) techniques are most effective. Methods: We applied conditional random fields (CRFs), a widely used machine learning algorithm, to identify each of 10 geriatric syndrome constructs in a clinical note. We assessed three sets of features and attributes for CRF operations: a base set, enhanced token, and contextual features. We trained the CRF on 3901 manually annotated notes from 85 patients, tuned the CRF on a validation set of 50 patients, and evaluated it on 50 held-out test patients. These notes were from a group of US Medicare patients over 65 years of age enrolled in a Medicare Advantage Health Maintenance Organization and cared for by a large group practice in Massachusetts. Results: A final feature set was formed through comprehensive feature ablation experiments. The final CRF model performed well at patient-level determination (macroaverage F1=0.834, microaverage F1=0.851); however, performance varied by construct. For example, at phrase-partial evaluation, the CRF model worked well on constructs such as absence of fecal control (F1=0.857) and vision impairment (F1=0.798) but poorly on malnutrition (F1=0.155), weight loss (F1=0.394), and severe urinary control issues (F1=0.532). Errors were primarily due to previously unobserved words (ie, out-of-vocabulary) and a lack of context. Conclusions: This study shows that statistical NLP can be used to identify geriatric syndromes from EHR-extracted clinical notes. This creates new opportunities to identify patients with geriatric syndromes and study their health outcomes.",

keywords = "Clinical notes, Conditional random fields, Geriatrics, Information extraction, Natural language processing",

author = "Tao Chen and Mark Dredze and Weiner, {Jonathan P.} and Leilani Hernandez and Joe Kimura and Hadi Kharrazi",

note = "Funding Information: This work was supported by Atrius Health and the Center for Population Health IT, Johns Hopkins University. Publisher Copyright: {\textcopyright} 2021 JMIR Publications Inc.. All rights reserved.",

year = "2019",

month = jan,

day = "1",

doi = "10.2196/13039",

language = "English (US)",

volume = "7",

journal = "JMIR Medical Informatics",

issn = "2291-9694",

publisher = "JMIR Publications Inc.",

number = "1",

}

TY - JOUR

T1 - Extraction of geriatric syndromes from electronic health record clinical notes

T2 - Assessment of statistical natural language processing methods

AU - Chen, Tao

AU - Dredze, Mark

AU - Weiner, Jonathan P.

AU - Hernandez, Leilani

AU - Kimura, Joe

AU - Kharrazi, Hadi

N1 - Funding Information: This work was supported by Atrius Health and the Center for Population Health IT, Johns Hopkins University. Publisher Copyright: © 2021 JMIR Publications Inc.. All rights reserved.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Background: Geriatric syndromes in older adults are associated with adverse outcomes. However, despite being reported in clinical notes, these syndromes are often poorly captured by diagnostic codes in the structured fields of electronic health records (EHRs) or administrative records. Objective: We aim to automatically determine if a patient has any geriatric syndromes by mining the free text of associated EHR clinical notes. We assessed which statistical natural language processing (NLP) techniques are most effective. Methods: We applied conditional random fields (CRFs), a widely used machine learning algorithm, to identify each of 10 geriatric syndrome constructs in a clinical note. We assessed three sets of features and attributes for CRF operations: a base set, enhanced token, and contextual features. We trained the CRF on 3901 manually annotated notes from 85 patients, tuned the CRF on a validation set of 50 patients, and evaluated it on 50 held-out test patients. These notes were from a group of US Medicare patients over 65 years of age enrolled in a Medicare Advantage Health Maintenance Organization and cared for by a large group practice in Massachusetts. Results: A final feature set was formed through comprehensive feature ablation experiments. The final CRF model performed well at patient-level determination (macroaverage F1=0.834, microaverage F1=0.851); however, performance varied by construct. For example, at phrase-partial evaluation, the CRF model worked well on constructs such as absence of fecal control (F1=0.857) and vision impairment (F1=0.798) but poorly on malnutrition (F1=0.155), weight loss (F1=0.394), and severe urinary control issues (F1=0.532). Errors were primarily due to previously unobserved words (ie, out-of-vocabulary) and a lack of context. Conclusions: This study shows that statistical NLP can be used to identify geriatric syndromes from EHR-extracted clinical notes. This creates new opportunities to identify patients with geriatric syndromes and study their health outcomes.

AB - Background: Geriatric syndromes in older adults are associated with adverse outcomes. However, despite being reported in clinical notes, these syndromes are often poorly captured by diagnostic codes in the structured fields of electronic health records (EHRs) or administrative records. Objective: We aim to automatically determine if a patient has any geriatric syndromes by mining the free text of associated EHR clinical notes. We assessed which statistical natural language processing (NLP) techniques are most effective. Methods: We applied conditional random fields (CRFs), a widely used machine learning algorithm, to identify each of 10 geriatric syndrome constructs in a clinical note. We assessed three sets of features and attributes for CRF operations: a base set, enhanced token, and contextual features. We trained the CRF on 3901 manually annotated notes from 85 patients, tuned the CRF on a validation set of 50 patients, and evaluated it on 50 held-out test patients. These notes were from a group of US Medicare patients over 65 years of age enrolled in a Medicare Advantage Health Maintenance Organization and cared for by a large group practice in Massachusetts. Results: A final feature set was formed through comprehensive feature ablation experiments. The final CRF model performed well at patient-level determination (macroaverage F1=0.834, microaverage F1=0.851); however, performance varied by construct. For example, at phrase-partial evaluation, the CRF model worked well on constructs such as absence of fecal control (F1=0.857) and vision impairment (F1=0.798) but poorly on malnutrition (F1=0.155), weight loss (F1=0.394), and severe urinary control issues (F1=0.532). Errors were primarily due to previously unobserved words (ie, out-of-vocabulary) and a lack of context. Conclusions: This study shows that statistical NLP can be used to identify geriatric syndromes from EHR-extracted clinical notes. This creates new opportunities to identify patients with geriatric syndromes and study their health outcomes.

KW - Clinical notes

KW - Conditional random fields

KW - Geriatrics

KW - Information extraction

KW - Natural language processing

UR - http://www.scopus.com/inward/record.url?scp=85097201075&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85097201075&partnerID=8YFLogxK

U2 - 10.2196/13039

DO - 10.2196/13039

M3 - Article

AN - SCOPUS:85097201075

SN - 2291-9694

VL - 7

JO - JMIR Medical Informatics

JF - JMIR Medical Informatics

IS - 1

M1 - e13039

ER -

Extraction of geriatric syndromes from electronic health record clinical notes: Assessment of statistical natural language processing methods

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this