Automated generation of radiologic descriptions on brain volume changes from T1-weighted MR images: Initial assessment of feasibility

The Alzheimer's Disease Neuroimaging Initiative

Research output: Contribution to journalArticle

Abstract

Purpose: To examine the feasibility and potential difficulties of automatically generating radiologic reports (RRs) to articulate the clinically important features of brain magnetic resonance (MR) images. Materials and Methods: We focused on examining brain atrophy by using magnetization-prepared rapid gradient-echo (MPRAGE) images. The technology was based on multi-atlas whole-brain segmentation that identified 283 structures, from which larger superstructures were created to represent the anatomic units most frequently used in RRs. Through two layers of data-reduction filters, based on anatomic and clinical knowledge, raw images (~10 MB) were converted to a few kilobytes of human-readable sentences. The tool was applied to images from 92 patients with memory problems, and the results were compared to RRs independently produced by three experienced radiologists. The mechanisms of disagreement were investigated to understand where machine-human interface succeeded or failed. Results: The automatically generated sentences had low sensitivity (mean: 24.5%) and precision (mean: 24.9%) values; these were significantly lower than the inter-rater sensitivity (mean: 32.7%) and precision (mean: 32.2%) of the radiologists. The causes of disagreement were divided into six error categories: mismatch of anatomic definitions (7.2 ± 9.3%), data-reduction errors (11.4 ± 3.9%), translator errors (3.1 ± 3.1%), difference in the spatial extent of used anatomic terms (8.3 ± 6.7%), segmentation quality (9.8 ± 2.0%), and threshold for sentence-triggering (60.2 ± 16.3%). Conclusion: These error mechanisms raise interesting questions about the potential of automated report generation and the quality of image reading by humans. The most significant discrepancy between the human and automatically generated RRs was caused by the sentence-triggering threshold (the degree of abnormality), which was fixed to z-score >2.0 for the automated generation, while the thresholds by radiologists varied among different anatomical structures.

Original languageEnglish (US)
Article number7
JournalFrontiers in Neurology
Volume10
Issue numberJAN
DOIs
StatePublished - Jan 1 2019

Fingerprint

Magnetic Resonance Spectroscopy
Brain
Atlases
Atrophy
Reading
Technology
Radiologists

Keywords

  • 3D T1 weighted image
  • Automated generation
  • Brain atlas
  • Brain atrophy
  • Dementia
  • Radiologic description

ASJC Scopus subject areas

  • Neurology
  • Clinical Neurology

Cite this

Automated generation of radiologic descriptions on brain volume changes from T1-weighted MR images : Initial assessment of feasibility. / The Alzheimer's Disease Neuroimaging Initiative.

In: Frontiers in Neurology, Vol. 10, No. JAN, 7, 01.01.2019.

Research output: Contribution to journalArticle

@article{6c8f4c44f21242219200a053eae7748f,
title = "Automated generation of radiologic descriptions on brain volume changes from T1-weighted MR images: Initial assessment of feasibility",
abstract = "Purpose: To examine the feasibility and potential difficulties of automatically generating radiologic reports (RRs) to articulate the clinically important features of brain magnetic resonance (MR) images. Materials and Methods: We focused on examining brain atrophy by using magnetization-prepared rapid gradient-echo (MPRAGE) images. The technology was based on multi-atlas whole-brain segmentation that identified 283 structures, from which larger superstructures were created to represent the anatomic units most frequently used in RRs. Through two layers of data-reduction filters, based on anatomic and clinical knowledge, raw images (~10 MB) were converted to a few kilobytes of human-readable sentences. The tool was applied to images from 92 patients with memory problems, and the results were compared to RRs independently produced by three experienced radiologists. The mechanisms of disagreement were investigated to understand where machine-human interface succeeded or failed. Results: The automatically generated sentences had low sensitivity (mean: 24.5{\%}) and precision (mean: 24.9{\%}) values; these were significantly lower than the inter-rater sensitivity (mean: 32.7{\%}) and precision (mean: 32.2{\%}) of the radiologists. The causes of disagreement were divided into six error categories: mismatch of anatomic definitions (7.2 ± 9.3{\%}), data-reduction errors (11.4 ± 3.9{\%}), translator errors (3.1 ± 3.1{\%}), difference in the spatial extent of used anatomic terms (8.3 ± 6.7{\%}), segmentation quality (9.8 ± 2.0{\%}), and threshold for sentence-triggering (60.2 ± 16.3{\%}). Conclusion: These error mechanisms raise interesting questions about the potential of automated report generation and the quality of image reading by humans. The most significant discrepancy between the human and automatically generated RRs was caused by the sentence-triggering threshold (the degree of abnormality), which was fixed to z-score >2.0 for the automated generation, while the thresholds by radiologists varied among different anatomical structures.",
keywords = "3D T1 weighted image, Automated generation, Brain atlas, Brain atrophy, Dementia, Radiologic description",
author = "{The Alzheimer's Disease Neuroimaging Initiative} and Kentaro Akazawa and Ryo Sakamoto and Satoshi Nakajima and Dan Wu and Yue Li and Kenichi Oishi and Faria, {Andreia Vasconcellos} and Kei Yamada and Kaori Togashi and Lyketsos, {Constantine G} and Miller, {Michael I.} and Susumu Mori",
year = "2019",
month = "1",
day = "1",
doi = "10.3389/fneur.2019.00007",
language = "English (US)",
volume = "10",
journal = "Frontiers in Neurology",
issn = "1664-2295",
publisher = "Frontiers Research Foundation",
number = "JAN",

}

TY - JOUR

T1 - Automated generation of radiologic descriptions on brain volume changes from T1-weighted MR images

T2 - Initial assessment of feasibility

AU - The Alzheimer's Disease Neuroimaging Initiative

AU - Akazawa, Kentaro

AU - Sakamoto, Ryo

AU - Nakajima, Satoshi

AU - Wu, Dan

AU - Li, Yue

AU - Oishi, Kenichi

AU - Faria, Andreia Vasconcellos

AU - Yamada, Kei

AU - Togashi, Kaori

AU - Lyketsos, Constantine G

AU - Miller, Michael I.

AU - Mori, Susumu

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Purpose: To examine the feasibility and potential difficulties of automatically generating radiologic reports (RRs) to articulate the clinically important features of brain magnetic resonance (MR) images. Materials and Methods: We focused on examining brain atrophy by using magnetization-prepared rapid gradient-echo (MPRAGE) images. The technology was based on multi-atlas whole-brain segmentation that identified 283 structures, from which larger superstructures were created to represent the anatomic units most frequently used in RRs. Through two layers of data-reduction filters, based on anatomic and clinical knowledge, raw images (~10 MB) were converted to a few kilobytes of human-readable sentences. The tool was applied to images from 92 patients with memory problems, and the results were compared to RRs independently produced by three experienced radiologists. The mechanisms of disagreement were investigated to understand where machine-human interface succeeded or failed. Results: The automatically generated sentences had low sensitivity (mean: 24.5%) and precision (mean: 24.9%) values; these were significantly lower than the inter-rater sensitivity (mean: 32.7%) and precision (mean: 32.2%) of the radiologists. The causes of disagreement were divided into six error categories: mismatch of anatomic definitions (7.2 ± 9.3%), data-reduction errors (11.4 ± 3.9%), translator errors (3.1 ± 3.1%), difference in the spatial extent of used anatomic terms (8.3 ± 6.7%), segmentation quality (9.8 ± 2.0%), and threshold for sentence-triggering (60.2 ± 16.3%). Conclusion: These error mechanisms raise interesting questions about the potential of automated report generation and the quality of image reading by humans. The most significant discrepancy between the human and automatically generated RRs was caused by the sentence-triggering threshold (the degree of abnormality), which was fixed to z-score >2.0 for the automated generation, while the thresholds by radiologists varied among different anatomical structures.

AB - Purpose: To examine the feasibility and potential difficulties of automatically generating radiologic reports (RRs) to articulate the clinically important features of brain magnetic resonance (MR) images. Materials and Methods: We focused on examining brain atrophy by using magnetization-prepared rapid gradient-echo (MPRAGE) images. The technology was based on multi-atlas whole-brain segmentation that identified 283 structures, from which larger superstructures were created to represent the anatomic units most frequently used in RRs. Through two layers of data-reduction filters, based on anatomic and clinical knowledge, raw images (~10 MB) were converted to a few kilobytes of human-readable sentences. The tool was applied to images from 92 patients with memory problems, and the results were compared to RRs independently produced by three experienced radiologists. The mechanisms of disagreement were investigated to understand where machine-human interface succeeded or failed. Results: The automatically generated sentences had low sensitivity (mean: 24.5%) and precision (mean: 24.9%) values; these were significantly lower than the inter-rater sensitivity (mean: 32.7%) and precision (mean: 32.2%) of the radiologists. The causes of disagreement were divided into six error categories: mismatch of anatomic definitions (7.2 ± 9.3%), data-reduction errors (11.4 ± 3.9%), translator errors (3.1 ± 3.1%), difference in the spatial extent of used anatomic terms (8.3 ± 6.7%), segmentation quality (9.8 ± 2.0%), and threshold for sentence-triggering (60.2 ± 16.3%). Conclusion: These error mechanisms raise interesting questions about the potential of automated report generation and the quality of image reading by humans. The most significant discrepancy between the human and automatically generated RRs was caused by the sentence-triggering threshold (the degree of abnormality), which was fixed to z-score >2.0 for the automated generation, while the thresholds by radiologists varied among different anatomical structures.

KW - 3D T1 weighted image

KW - Automated generation

KW - Brain atlas

KW - Brain atrophy

KW - Dementia

KW - Radiologic description

UR - http://www.scopus.com/inward/record.url?scp=85065476622&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065476622&partnerID=8YFLogxK

U2 - 10.3389/fneur.2019.00007

DO - 10.3389/fneur.2019.00007

M3 - Article

C2 - 30733701

AN - SCOPUS:85065476622

VL - 10

JO - Frontiers in Neurology

JF - Frontiers in Neurology

SN - 1664-2295

IS - JAN

M1 - 7

ER -