Prediction of lung cancer risk at follow-up screening with low-dose CT: a training and validation study of a deep learning method

Peng Huang, Cheng T. Lin, Yuliang Li, Martin C. Tammemagi, Malcolm V. Brock, Sukhinder Atkar-Khattra, Yanxun Xu, Ping Hu, John R. Mayo, Heidi Schmidt, Michel Gingras, Sergio Pasian, Lori Stewart, Scott Tsai, Jean M. Seely, Daria Manos, Paul Burrowes, Rick Bhatia, Ming Sound Tsao, Stephen Lam

Research output: Contribution to journalArticle

Abstract

Background: Current lung cancer screening guidelines use either mean diameter, volume, or density of the largest lung nodule on the previous CT scan or appearance of a new nodule to ascertain the timing of the next CT scan. We aimed to develop an accurate screening protocol by estimating the 3-year lung cancer risk after two screening CT scans using deep learning of radiologists' CT readings and other universally available clinical information. Methods: A deep learning algorithm (referred to as DeepLR) was developed using data from participants who had received at least two CT screening scans up to 2 years apart in the National Lung Screening Trial (NLST; training cohort). Double-blinded validation was done using data from participants in the Pan-Canadian Early Detection of Lung Cancer (PanCan) study (validation cohort). The primary analysis was to compare accuracy of DeepLR scores to predict lung cancer incidence at 1 year, 2 years, and 3 years with the Lung CT Screening Reporting & Data System (Lung-RADS) and volume doubling time, using time-dependent area under the receiver operating characteristic curve (AUC) analysis. Findings: The training cohort consisted of 25 097 participants from NLST and the validation cohort comprised 2294 individuals from PanCan. In the validation cohort, DeepLR showed good discrimination, with 1-year, 2-year, and 3-year time-dependent AUC values for cancer diagnosis of 0·968 (SD 0·013), 0·946 (0·013), and 0·899 (0·017), respectively. Among individuals deemed high risk by DeepLR, 94%, 85%, and 71% of incident and interval lung cancers diagnosed within 1 year, 2 years, and 3 years, respectively, after the second screening CT scan were identified. Furthermore, individuals with high DeepLR scores had a significantly higher risk of mortality (hazard ratio 16·07, 95% CI 10·15–25·44; p<0·0001) among people with high scores on Lung-RADS. Interpretation: DeepLR recognises patterns in both temporal and spatial changes and synergy among changes in nodule and non-nodule features. DeepLR scores could be used to accurately guide clinical management after the next scheduled repeat screening CT scan. Funding: Allegheny Health Network, Johns Hopkins University, Terry Fox Research Institute, and British Columbia Cancer Foundation.

Original languageEnglish (US)
Pages (from-to)e353-e362
JournalThe Lancet Digital Health
Volume1
Issue number7
DOIs
StatePublished - Nov 2019

Fingerprint

Validation Studies
Lung Neoplasms
Learning
Lung
Early Detection of Cancer
Area Under Curve
British Columbia
Information Systems
ROC Curve
Reading
Neoplasms
Cohort Studies
Learning methods
Prediction
Lung cancer
Deep learning
Screening
Guidelines
Mortality
Incidence

ASJC Scopus subject areas

  • Medicine (miscellaneous)
  • Health Informatics
  • Decision Sciences (miscellaneous)
  • Health Information Management

Cite this

Prediction of lung cancer risk at follow-up screening with low-dose CT : a training and validation study of a deep learning method. / Huang, Peng; Lin, Cheng T.; Li, Yuliang; Tammemagi, Martin C.; Brock, Malcolm V.; Atkar-Khattra, Sukhinder; Xu, Yanxun; Hu, Ping; Mayo, John R.; Schmidt, Heidi; Gingras, Michel; Pasian, Sergio; Stewart, Lori; Tsai, Scott; Seely, Jean M.; Manos, Daria; Burrowes, Paul; Bhatia, Rick; Tsao, Ming Sound; Lam, Stephen.

In: The Lancet Digital Health, Vol. 1, No. 7, 11.2019, p. e353-e362.

Research output: Contribution to journalArticle

Huang, P, Lin, CT, Li, Y, Tammemagi, MC, Brock, MV, Atkar-Khattra, S, Xu, Y, Hu, P, Mayo, JR, Schmidt, H, Gingras, M, Pasian, S, Stewart, L, Tsai, S, Seely, JM, Manos, D, Burrowes, P, Bhatia, R, Tsao, MS & Lam, S 2019, 'Prediction of lung cancer risk at follow-up screening with low-dose CT: a training and validation study of a deep learning method', The Lancet Digital Health, vol. 1, no. 7, pp. e353-e362. https://doi.org/10.1016/S2589-7500(19)30159-1
Huang, Peng ; Lin, Cheng T. ; Li, Yuliang ; Tammemagi, Martin C. ; Brock, Malcolm V. ; Atkar-Khattra, Sukhinder ; Xu, Yanxun ; Hu, Ping ; Mayo, John R. ; Schmidt, Heidi ; Gingras, Michel ; Pasian, Sergio ; Stewart, Lori ; Tsai, Scott ; Seely, Jean M. ; Manos, Daria ; Burrowes, Paul ; Bhatia, Rick ; Tsao, Ming Sound ; Lam, Stephen. / Prediction of lung cancer risk at follow-up screening with low-dose CT : a training and validation study of a deep learning method. In: The Lancet Digital Health. 2019 ; Vol. 1, No. 7. pp. e353-e362.
@article{c1db5090484e4f9d825265eee8d6272a,
title = "Prediction of lung cancer risk at follow-up screening with low-dose CT: a training and validation study of a deep learning method",
abstract = "Background: Current lung cancer screening guidelines use either mean diameter, volume, or density of the largest lung nodule on the previous CT scan or appearance of a new nodule to ascertain the timing of the next CT scan. We aimed to develop an accurate screening protocol by estimating the 3-year lung cancer risk after two screening CT scans using deep learning of radiologists' CT readings and other universally available clinical information. Methods: A deep learning algorithm (referred to as DeepLR) was developed using data from participants who had received at least two CT screening scans up to 2 years apart in the National Lung Screening Trial (NLST; training cohort). Double-blinded validation was done using data from participants in the Pan-Canadian Early Detection of Lung Cancer (PanCan) study (validation cohort). The primary analysis was to compare accuracy of DeepLR scores to predict lung cancer incidence at 1 year, 2 years, and 3 years with the Lung CT Screening Reporting & Data System (Lung-RADS) and volume doubling time, using time-dependent area under the receiver operating characteristic curve (AUC) analysis. Findings: The training cohort consisted of 25 097 participants from NLST and the validation cohort comprised 2294 individuals from PanCan. In the validation cohort, DeepLR showed good discrimination, with 1-year, 2-year, and 3-year time-dependent AUC values for cancer diagnosis of 0·968 (SD 0·013), 0·946 (0·013), and 0·899 (0·017), respectively. Among individuals deemed high risk by DeepLR, 94{\%}, 85{\%}, and 71{\%} of incident and interval lung cancers diagnosed within 1 year, 2 years, and 3 years, respectively, after the second screening CT scan were identified. Furthermore, individuals with high DeepLR scores had a significantly higher risk of mortality (hazard ratio 16·07, 95{\%} CI 10·15–25·44; p<0·0001) among people with high scores on Lung-RADS. Interpretation: DeepLR recognises patterns in both temporal and spatial changes and synergy among changes in nodule and non-nodule features. DeepLR scores could be used to accurately guide clinical management after the next scheduled repeat screening CT scan. Funding: Allegheny Health Network, Johns Hopkins University, Terry Fox Research Institute, and British Columbia Cancer Foundation.",
author = "Peng Huang and Lin, {Cheng T.} and Yuliang Li and Tammemagi, {Martin C.} and Brock, {Malcolm V.} and Sukhinder Atkar-Khattra and Yanxun Xu and Ping Hu and Mayo, {John R.} and Heidi Schmidt and Michel Gingras and Sergio Pasian and Lori Stewart and Scott Tsai and Seely, {Jean M.} and Daria Manos and Paul Burrowes and Rick Bhatia and Tsao, {Ming Sound} and Stephen Lam",
year = "2019",
month = "11",
doi = "10.1016/S2589-7500(19)30159-1",
language = "English (US)",
volume = "1",
pages = "e353--e362",
journal = "The Lancet Digital Health",
issn = "2589-7500",
publisher = "Elsevier Ltd",
number = "7",

}

TY - JOUR

T1 - Prediction of lung cancer risk at follow-up screening with low-dose CT

T2 - a training and validation study of a deep learning method

AU - Huang, Peng

AU - Lin, Cheng T.

AU - Li, Yuliang

AU - Tammemagi, Martin C.

AU - Brock, Malcolm V.

AU - Atkar-Khattra, Sukhinder

AU - Xu, Yanxun

AU - Hu, Ping

AU - Mayo, John R.

AU - Schmidt, Heidi

AU - Gingras, Michel

AU - Pasian, Sergio

AU - Stewart, Lori

AU - Tsai, Scott

AU - Seely, Jean M.

AU - Manos, Daria

AU - Burrowes, Paul

AU - Bhatia, Rick

AU - Tsao, Ming Sound

AU - Lam, Stephen

PY - 2019/11

Y1 - 2019/11

N2 - Background: Current lung cancer screening guidelines use either mean diameter, volume, or density of the largest lung nodule on the previous CT scan or appearance of a new nodule to ascertain the timing of the next CT scan. We aimed to develop an accurate screening protocol by estimating the 3-year lung cancer risk after two screening CT scans using deep learning of radiologists' CT readings and other universally available clinical information. Methods: A deep learning algorithm (referred to as DeepLR) was developed using data from participants who had received at least two CT screening scans up to 2 years apart in the National Lung Screening Trial (NLST; training cohort). Double-blinded validation was done using data from participants in the Pan-Canadian Early Detection of Lung Cancer (PanCan) study (validation cohort). The primary analysis was to compare accuracy of DeepLR scores to predict lung cancer incidence at 1 year, 2 years, and 3 years with the Lung CT Screening Reporting & Data System (Lung-RADS) and volume doubling time, using time-dependent area under the receiver operating characteristic curve (AUC) analysis. Findings: The training cohort consisted of 25 097 participants from NLST and the validation cohort comprised 2294 individuals from PanCan. In the validation cohort, DeepLR showed good discrimination, with 1-year, 2-year, and 3-year time-dependent AUC values for cancer diagnosis of 0·968 (SD 0·013), 0·946 (0·013), and 0·899 (0·017), respectively. Among individuals deemed high risk by DeepLR, 94%, 85%, and 71% of incident and interval lung cancers diagnosed within 1 year, 2 years, and 3 years, respectively, after the second screening CT scan were identified. Furthermore, individuals with high DeepLR scores had a significantly higher risk of mortality (hazard ratio 16·07, 95% CI 10·15–25·44; p<0·0001) among people with high scores on Lung-RADS. Interpretation: DeepLR recognises patterns in both temporal and spatial changes and synergy among changes in nodule and non-nodule features. DeepLR scores could be used to accurately guide clinical management after the next scheduled repeat screening CT scan. Funding: Allegheny Health Network, Johns Hopkins University, Terry Fox Research Institute, and British Columbia Cancer Foundation.

AB - Background: Current lung cancer screening guidelines use either mean diameter, volume, or density of the largest lung nodule on the previous CT scan or appearance of a new nodule to ascertain the timing of the next CT scan. We aimed to develop an accurate screening protocol by estimating the 3-year lung cancer risk after two screening CT scans using deep learning of radiologists' CT readings and other universally available clinical information. Methods: A deep learning algorithm (referred to as DeepLR) was developed using data from participants who had received at least two CT screening scans up to 2 years apart in the National Lung Screening Trial (NLST; training cohort). Double-blinded validation was done using data from participants in the Pan-Canadian Early Detection of Lung Cancer (PanCan) study (validation cohort). The primary analysis was to compare accuracy of DeepLR scores to predict lung cancer incidence at 1 year, 2 years, and 3 years with the Lung CT Screening Reporting & Data System (Lung-RADS) and volume doubling time, using time-dependent area under the receiver operating characteristic curve (AUC) analysis. Findings: The training cohort consisted of 25 097 participants from NLST and the validation cohort comprised 2294 individuals from PanCan. In the validation cohort, DeepLR showed good discrimination, with 1-year, 2-year, and 3-year time-dependent AUC values for cancer diagnosis of 0·968 (SD 0·013), 0·946 (0·013), and 0·899 (0·017), respectively. Among individuals deemed high risk by DeepLR, 94%, 85%, and 71% of incident and interval lung cancers diagnosed within 1 year, 2 years, and 3 years, respectively, after the second screening CT scan were identified. Furthermore, individuals with high DeepLR scores had a significantly higher risk of mortality (hazard ratio 16·07, 95% CI 10·15–25·44; p<0·0001) among people with high scores on Lung-RADS. Interpretation: DeepLR recognises patterns in both temporal and spatial changes and synergy among changes in nodule and non-nodule features. DeepLR scores could be used to accurately guide clinical management after the next scheduled repeat screening CT scan. Funding: Allegheny Health Network, Johns Hopkins University, Terry Fox Research Institute, and British Columbia Cancer Foundation.

UR - http://www.scopus.com/inward/record.url?scp=85073673195&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073673195&partnerID=8YFLogxK

U2 - 10.1016/S2589-7500(19)30159-1

DO - 10.1016/S2589-7500(19)30159-1

M3 - Article

AN - SCOPUS:85073673195

VL - 1

SP - e353-e362

JO - The Lancet Digital Health

JF - The Lancet Digital Health

SN - 2589-7500

IS - 7

ER -