Crowdsourcing to Evaluate Fundus Photographs for the Presence of Glaucoma

Xueyang Wang, Lucy I. Mudie, Mani Baskaran, Ching Yu Cheng, Wallace L. Alward, David S Friedman, Christopher J. Brady

Research output: Contribution to journalArticle

Abstract

Purpose: To assess the accuracy of crowdsourcing for grading optic nerve images for glaucoma using Amazon Mechanical Turk before and after training modules. Materials and Methods: Images (n=60) from 2 large population studies were graded for glaucoma status and vertical cup-to-disc ratio (VCDR). In the baseline trial, users on Amazon Mechanical Turk (Turkers) graded fundus photos for glaucoma and VCDR after reviewing annotated example images. In 2 additional trials, Turkers viewed a 26-slide PowerPoint training or a 10-minute video training and passed a quiz before being permitted to grade the same 60 images. Each image was graded by 10 unique Turkers in all trials. The mode of Turker grades for each image was compared with an adjudicated expert grade to determine accuracy as well as the sensitivity and specificity of Turker grading. Results: In the baseline study, 50% of the images were graded correctly for glaucoma status and the area under the receiver operating characteristic (AUROC) was 0.75 [95% confidence interval (CI), 0.64-0.87]. Post-PowerPoint training, 66.7% of the images were graded correctly with AUROC of 0.86 (95% CI, 0.78-0.95). Finally, Turker grading accuracy was 63.3% with AUROC of 0.89 (95% CI, 0.83-0.96) after video training. Overall, Turker VCDR grades for each image correlated with expert VCDR grades (Bland-Altman plot mean difference=-0.02). Conclusions: Turkers graded 60 fundus images quickly and at low cost, with grading accuracy, sensitivity, and specificity, all improving with brief training. With effective education, crowdsourcing may be an efficient tool to aid in the identification of glaucomatous changes in retinal images.

Original languageEnglish (US)
Pages (from-to)505-510
Number of pages6
JournalJournal of Glaucoma
Volume26
Issue number6
DOIs
StatePublished - 2017

Fingerprint

Crowdsourcing
Glaucoma
ROC Curve
Confidence Intervals
Sensitivity and Specificity
Optic Nerve
Education
Costs and Cost Analysis
Population

Keywords

  • crowdsourcing
  • image analysis
  • teleglaucoma

ASJC Scopus subject areas

  • Ophthalmology

Cite this

Wang, X., Mudie, L. I., Baskaran, M., Cheng, C. Y., Alward, W. L., Friedman, D. S., & Brady, C. J. (2017). Crowdsourcing to Evaluate Fundus Photographs for the Presence of Glaucoma. Journal of Glaucoma, 26(6), 505-510. https://doi.org/10.1097/IJG.0000000000000660

Crowdsourcing to Evaluate Fundus Photographs for the Presence of Glaucoma. / Wang, Xueyang; Mudie, Lucy I.; Baskaran, Mani; Cheng, Ching Yu; Alward, Wallace L.; Friedman, David S; Brady, Christopher J.

In: Journal of Glaucoma, Vol. 26, No. 6, 2017, p. 505-510.

Research output: Contribution to journalArticle

Wang, X, Mudie, LI, Baskaran, M, Cheng, CY, Alward, WL, Friedman, DS & Brady, CJ 2017, 'Crowdsourcing to Evaluate Fundus Photographs for the Presence of Glaucoma', Journal of Glaucoma, vol. 26, no. 6, pp. 505-510. https://doi.org/10.1097/IJG.0000000000000660
Wang, Xueyang ; Mudie, Lucy I. ; Baskaran, Mani ; Cheng, Ching Yu ; Alward, Wallace L. ; Friedman, David S ; Brady, Christopher J. / Crowdsourcing to Evaluate Fundus Photographs for the Presence of Glaucoma. In: Journal of Glaucoma. 2017 ; Vol. 26, No. 6. pp. 505-510.
@article{90de31769fcb41e1b9c191ae35c1fa54,
title = "Crowdsourcing to Evaluate Fundus Photographs for the Presence of Glaucoma",
abstract = "Purpose: To assess the accuracy of crowdsourcing for grading optic nerve images for glaucoma using Amazon Mechanical Turk before and after training modules. Materials and Methods: Images (n=60) from 2 large population studies were graded for glaucoma status and vertical cup-to-disc ratio (VCDR). In the baseline trial, users on Amazon Mechanical Turk (Turkers) graded fundus photos for glaucoma and VCDR after reviewing annotated example images. In 2 additional trials, Turkers viewed a 26-slide PowerPoint training or a 10-minute video training and passed a quiz before being permitted to grade the same 60 images. Each image was graded by 10 unique Turkers in all trials. The mode of Turker grades for each image was compared with an adjudicated expert grade to determine accuracy as well as the sensitivity and specificity of Turker grading. Results: In the baseline study, 50{\%} of the images were graded correctly for glaucoma status and the area under the receiver operating characteristic (AUROC) was 0.75 [95{\%} confidence interval (CI), 0.64-0.87]. Post-PowerPoint training, 66.7{\%} of the images were graded correctly with AUROC of 0.86 (95{\%} CI, 0.78-0.95). Finally, Turker grading accuracy was 63.3{\%} with AUROC of 0.89 (95{\%} CI, 0.83-0.96) after video training. Overall, Turker VCDR grades for each image correlated with expert VCDR grades (Bland-Altman plot mean difference=-0.02). Conclusions: Turkers graded 60 fundus images quickly and at low cost, with grading accuracy, sensitivity, and specificity, all improving with brief training. With effective education, crowdsourcing may be an efficient tool to aid in the identification of glaucomatous changes in retinal images.",
keywords = "crowdsourcing, image analysis, teleglaucoma",
author = "Xueyang Wang and Mudie, {Lucy I.} and Mani Baskaran and Cheng, {Ching Yu} and Alward, {Wallace L.} and Friedman, {David S} and Brady, {Christopher J.}",
year = "2017",
doi = "10.1097/IJG.0000000000000660",
language = "English (US)",
volume = "26",
pages = "505--510",
journal = "Journal of Glaucoma",
issn = "1057-0829",
publisher = "Lippincott Williams and Wilkins",
number = "6",

}

TY - JOUR

T1 - Crowdsourcing to Evaluate Fundus Photographs for the Presence of Glaucoma

AU - Wang, Xueyang

AU - Mudie, Lucy I.

AU - Baskaran, Mani

AU - Cheng, Ching Yu

AU - Alward, Wallace L.

AU - Friedman, David S

AU - Brady, Christopher J.

PY - 2017

Y1 - 2017

N2 - Purpose: To assess the accuracy of crowdsourcing for grading optic nerve images for glaucoma using Amazon Mechanical Turk before and after training modules. Materials and Methods: Images (n=60) from 2 large population studies were graded for glaucoma status and vertical cup-to-disc ratio (VCDR). In the baseline trial, users on Amazon Mechanical Turk (Turkers) graded fundus photos for glaucoma and VCDR after reviewing annotated example images. In 2 additional trials, Turkers viewed a 26-slide PowerPoint training or a 10-minute video training and passed a quiz before being permitted to grade the same 60 images. Each image was graded by 10 unique Turkers in all trials. The mode of Turker grades for each image was compared with an adjudicated expert grade to determine accuracy as well as the sensitivity and specificity of Turker grading. Results: In the baseline study, 50% of the images were graded correctly for glaucoma status and the area under the receiver operating characteristic (AUROC) was 0.75 [95% confidence interval (CI), 0.64-0.87]. Post-PowerPoint training, 66.7% of the images were graded correctly with AUROC of 0.86 (95% CI, 0.78-0.95). Finally, Turker grading accuracy was 63.3% with AUROC of 0.89 (95% CI, 0.83-0.96) after video training. Overall, Turker VCDR grades for each image correlated with expert VCDR grades (Bland-Altman plot mean difference=-0.02). Conclusions: Turkers graded 60 fundus images quickly and at low cost, with grading accuracy, sensitivity, and specificity, all improving with brief training. With effective education, crowdsourcing may be an efficient tool to aid in the identification of glaucomatous changes in retinal images.

AB - Purpose: To assess the accuracy of crowdsourcing for grading optic nerve images for glaucoma using Amazon Mechanical Turk before and after training modules. Materials and Methods: Images (n=60) from 2 large population studies were graded for glaucoma status and vertical cup-to-disc ratio (VCDR). In the baseline trial, users on Amazon Mechanical Turk (Turkers) graded fundus photos for glaucoma and VCDR after reviewing annotated example images. In 2 additional trials, Turkers viewed a 26-slide PowerPoint training or a 10-minute video training and passed a quiz before being permitted to grade the same 60 images. Each image was graded by 10 unique Turkers in all trials. The mode of Turker grades for each image was compared with an adjudicated expert grade to determine accuracy as well as the sensitivity and specificity of Turker grading. Results: In the baseline study, 50% of the images were graded correctly for glaucoma status and the area under the receiver operating characteristic (AUROC) was 0.75 [95% confidence interval (CI), 0.64-0.87]. Post-PowerPoint training, 66.7% of the images were graded correctly with AUROC of 0.86 (95% CI, 0.78-0.95). Finally, Turker grading accuracy was 63.3% with AUROC of 0.89 (95% CI, 0.83-0.96) after video training. Overall, Turker VCDR grades for each image correlated with expert VCDR grades (Bland-Altman plot mean difference=-0.02). Conclusions: Turkers graded 60 fundus images quickly and at low cost, with grading accuracy, sensitivity, and specificity, all improving with brief training. With effective education, crowdsourcing may be an efficient tool to aid in the identification of glaucomatous changes in retinal images.

KW - crowdsourcing

KW - image analysis

KW - teleglaucoma

UR - http://www.scopus.com/inward/record.url?scp=85015621493&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85015621493&partnerID=8YFLogxK

U2 - 10.1097/IJG.0000000000000660

DO - 10.1097/IJG.0000000000000660

M3 - Article

C2 - 28319525

AN - SCOPUS:85015621493

VL - 26

SP - 505

EP - 510

JO - Journal of Glaucoma

JF - Journal of Glaucoma

SN - 1057-0829

IS - 6

ER -