Management of thyroid nodules seen on us images: Deep learning may match performance of radiologists

Mateusz Buda; Benjamin Wildman-Tobriner; Jenny K. Hoang; David Thayer; Franklin N. Tessler; William D. Middleton; MacIej A. Mazurowski

doi:10.1148/radiol.2019181343

Management of thyroid nodules seen on us images: Deep learning may match performance of radiologists

Mateusz Buda, Benjamin Wildman-Tobriner, Jenny K. Hoang, David Thayer, Franklin N. Tessler, William D. Middleton, MacIej A. Mazurowski

Research output: Contribution to journal › Article › peer-review

38 Scopus citations

Abstract

Background: Management of thyroid nodules may be inconsistent between different observers and time consuming for radiologists. An artificial intelligence system that uses deep learning may improve radiology workflow for management of thyroid nodules. Purpose: To develop a deep learning algorithm that uses thyroid US images to decide whether a thyroid nodule should undergo a biopsy and to compare the performance of the algorithm with the performance of radiologists who adhere to American College of Radiology (ACR) Thyroid Imaging Reporting and Data System (TI-RADS). Materials and Methods: In this retrospective analysis, studies in patients referred for US with subsequent fine-needle aspiration or with surgical histologic analysis used as the standard were evaluated. The study period was from August 2006 to May 2010. A multitask deep convolutional neural network was trained to provide biopsy recommendations for thyroid nodules on the basis of two orthogonal US images as the input. In the training phase, the deep learning algorithm was first evaluated by using 10-fold cross-validation. Internal validation was then performed on an independent set of 99 consecutive nodules. The sensitivity and specificity of the algorithm were compared with a consensus of three ACR TI-RADS committee experts and nine other radiologists, all of whom interpreted thyroid US images in clinical practice. Results: Included were 1377 thyroid nodules in 1230 patients with complete imaging data and conclusive cytologic or histologic diagnoses. For the 99 test nodules, the proposed deep learning algorithm achieved 13 of 15 (87%: 95% confidence interval [CI]: 67%, 100%) sensitivity, the same as expert consensus (P .99) and higher than five of nine radiologists. The specificity of the deep learning algorithm was 44 of 84 (52%; 95% CI: 42%, 62%), which was similar to expert consensus (43 of 84; 51%; 95% CI: 41%, 62%; P = .91) and higher than seven of nine other radiologists. The mean sensitivity and specificity for the nine radiologists was 83% (95% CI: 64%, 98%) and 48% (95% CI: 37%, 59%), respectively. Conclusion: Sensitivity and specificity of a deep learning algorithm for thyroid nodule biopsy recommendations was similar to that of expert radiologists who used American College of Radiology Thyroid Imaging and Reporting Data System guidelines.

Original language	English (US)
Pages (from-to)	695-701
Number of pages	7
Journal	RADIOLOGY
Volume	292
Issue number	3
DOIs	https://doi.org/10.1148/radiol.2019181343
State	Published - 2019
Externally published	Yes

ASJC Scopus subject areas

Radiology Nuclear Medicine and imaging

Access to Document

10.1148/radiol.2019181343

Cite this

@article{16b9709678eb4eb0955ec12bfaeb5ce9,

title = "Management of thyroid nodules seen on us images: Deep learning may match performance of radiologists",

abstract = "Background: Management of thyroid nodules may be inconsistent between different observers and time consuming for radiologists. An artificial intelligence system that uses deep learning may improve radiology workflow for management of thyroid nodules. Purpose: To develop a deep learning algorithm that uses thyroid US images to decide whether a thyroid nodule should undergo a biopsy and to compare the performance of the algorithm with the performance of radiologists who adhere to American College of Radiology (ACR) Thyroid Imaging Reporting and Data System (TI-RADS). Materials and Methods: In this retrospective analysis, studies in patients referred for US with subsequent fine-needle aspiration or with surgical histologic analysis used as the standard were evaluated. The study period was from August 2006 to May 2010. A multitask deep convolutional neural network was trained to provide biopsy recommendations for thyroid nodules on the basis of two orthogonal US images as the input. In the training phase, the deep learning algorithm was first evaluated by using 10-fold cross-validation. Internal validation was then performed on an independent set of 99 consecutive nodules. The sensitivity and specificity of the algorithm were compared with a consensus of three ACR TI-RADS committee experts and nine other radiologists, all of whom interpreted thyroid US images in clinical practice. Results: Included were 1377 thyroid nodules in 1230 patients with complete imaging data and conclusive cytologic or histologic diagnoses. For the 99 test nodules, the proposed deep learning algorithm achieved 13 of 15 (87%: 95% confidence interval [CI]: 67%, 100%) sensitivity, the same as expert consensus (P .99) and higher than five of nine radiologists. The specificity of the deep learning algorithm was 44 of 84 (52%; 95% CI: 42%, 62%), which was similar to expert consensus (43 of 84; 51%; 95% CI: 41%, 62%; P = .91) and higher than seven of nine other radiologists. The mean sensitivity and specificity for the nine radiologists was 83% (95% CI: 64%, 98%) and 48% (95% CI: 37%, 59%), respectively. Conclusion: Sensitivity and specificity of a deep learning algorithm for thyroid nodule biopsy recommendations was similar to that of expert radiologists who used American College of Radiology Thyroid Imaging and Reporting Data System guidelines.",

author = "Mateusz Buda and Benjamin Wildman-Tobriner and Hoang, {Jenny K.} and David Thayer and Tessler, {Franklin N.} and Middleton, {William D.} and Mazurowski, {MacIej A.}",

note = "Publisher Copyright: {\textcopyright} RSNA, 2019.",

year = "2019",

doi = "10.1148/radiol.2019181343",

language = "English (US)",

volume = "292",

pages = "695--701",

journal = "RADIOLOGY",

issn = "0033-8419",

publisher = "Radiological Society of North America Inc.",

number = "3",

}

TY - JOUR

T1 - Management of thyroid nodules seen on us images

T2 - Deep learning may match performance of radiologists

AU - Buda, Mateusz

AU - Wildman-Tobriner, Benjamin

AU - Hoang, Jenny K.

AU - Thayer, David

AU - Tessler, Franklin N.

AU - Middleton, William D.

AU - Mazurowski, MacIej A.

N1 - Publisher Copyright: © RSNA, 2019.

PY - 2019

Y1 - 2019

N2 - Background: Management of thyroid nodules may be inconsistent between different observers and time consuming for radiologists. An artificial intelligence system that uses deep learning may improve radiology workflow for management of thyroid nodules. Purpose: To develop a deep learning algorithm that uses thyroid US images to decide whether a thyroid nodule should undergo a biopsy and to compare the performance of the algorithm with the performance of radiologists who adhere to American College of Radiology (ACR) Thyroid Imaging Reporting and Data System (TI-RADS). Materials and Methods: In this retrospective analysis, studies in patients referred for US with subsequent fine-needle aspiration or with surgical histologic analysis used as the standard were evaluated. The study period was from August 2006 to May 2010. A multitask deep convolutional neural network was trained to provide biopsy recommendations for thyroid nodules on the basis of two orthogonal US images as the input. In the training phase, the deep learning algorithm was first evaluated by using 10-fold cross-validation. Internal validation was then performed on an independent set of 99 consecutive nodules. The sensitivity and specificity of the algorithm were compared with a consensus of three ACR TI-RADS committee experts and nine other radiologists, all of whom interpreted thyroid US images in clinical practice. Results: Included were 1377 thyroid nodules in 1230 patients with complete imaging data and conclusive cytologic or histologic diagnoses. For the 99 test nodules, the proposed deep learning algorithm achieved 13 of 15 (87%: 95% confidence interval [CI]: 67%, 100%) sensitivity, the same as expert consensus (P .99) and higher than five of nine radiologists. The specificity of the deep learning algorithm was 44 of 84 (52%; 95% CI: 42%, 62%), which was similar to expert consensus (43 of 84; 51%; 95% CI: 41%, 62%; P = .91) and higher than seven of nine other radiologists. The mean sensitivity and specificity for the nine radiologists was 83% (95% CI: 64%, 98%) and 48% (95% CI: 37%, 59%), respectively. Conclusion: Sensitivity and specificity of a deep learning algorithm for thyroid nodule biopsy recommendations was similar to that of expert radiologists who used American College of Radiology Thyroid Imaging and Reporting Data System guidelines.

AB - Background: Management of thyroid nodules may be inconsistent between different observers and time consuming for radiologists. An artificial intelligence system that uses deep learning may improve radiology workflow for management of thyroid nodules. Purpose: To develop a deep learning algorithm that uses thyroid US images to decide whether a thyroid nodule should undergo a biopsy and to compare the performance of the algorithm with the performance of radiologists who adhere to American College of Radiology (ACR) Thyroid Imaging Reporting and Data System (TI-RADS). Materials and Methods: In this retrospective analysis, studies in patients referred for US with subsequent fine-needle aspiration or with surgical histologic analysis used as the standard were evaluated. The study period was from August 2006 to May 2010. A multitask deep convolutional neural network was trained to provide biopsy recommendations for thyroid nodules on the basis of two orthogonal US images as the input. In the training phase, the deep learning algorithm was first evaluated by using 10-fold cross-validation. Internal validation was then performed on an independent set of 99 consecutive nodules. The sensitivity and specificity of the algorithm were compared with a consensus of three ACR TI-RADS committee experts and nine other radiologists, all of whom interpreted thyroid US images in clinical practice. Results: Included were 1377 thyroid nodules in 1230 patients with complete imaging data and conclusive cytologic or histologic diagnoses. For the 99 test nodules, the proposed deep learning algorithm achieved 13 of 15 (87%: 95% confidence interval [CI]: 67%, 100%) sensitivity, the same as expert consensus (P .99) and higher than five of nine radiologists. The specificity of the deep learning algorithm was 44 of 84 (52%; 95% CI: 42%, 62%), which was similar to expert consensus (43 of 84; 51%; 95% CI: 41%, 62%; P = .91) and higher than seven of nine other radiologists. The mean sensitivity and specificity for the nine radiologists was 83% (95% CI: 64%, 98%) and 48% (95% CI: 37%, 59%), respectively. Conclusion: Sensitivity and specificity of a deep learning algorithm for thyroid nodule biopsy recommendations was similar to that of expert radiologists who used American College of Radiology Thyroid Imaging and Reporting Data System guidelines.

UR - http://www.scopus.com/inward/record.url?scp=85071387984&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071387984&partnerID=8YFLogxK

U2 - 10.1148/radiol.2019181343

DO - 10.1148/radiol.2019181343

M3 - Article

C2 - 31287391

AN - SCOPUS:85071387984

SN - 0033-8419

VL - 292

SP - 695

EP - 701

JO - RADIOLOGY

JF - RADIOLOGY

IS - 3

ER -

Management of thyroid nodules seen on us images: Deep learning may match performance of radiologists

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this