Evaluating the evaluation of cancer driver genes

Collin J. Tokheim, Nickolas Papadopoulos, Kenneth W Kinzler, Bert Vogelstein, Rachel Karchin

Research output: Contribution to journalArticle

Abstract

Sequencing has identified millions of somatic mutations in human cancers, but distinguishing cancer driver genes remains a major challenge. Numerous methods have been developed to identify driver genes, but evaluation of the performance of these methods is hindered by the lack of a gold standard, that is, bona fide driver gene mutations. Here, we establish an evaluation framework that can be applied to driver gene prediction methods. We used this framework to compare the performance of eight such methods. One of these methods, described here, incorporated a machinelearning- based ratiometric approach. We show that the driver genes predicted by each of the eight methods vary widely. Moreover, the P values reported by several of the methods were inconsistent with the uniform values expected, thus calling into question the assumptions that were used to generate them. Finally, we evaluated the potential effects of unexplained variability in mutation rates on false-positive driver gene predictions. Our analysis points to the strengths and weaknesses of each of the currently available methods and offers guidance for improving them in the future.

Original languageEnglish (US)
Pages (from-to)14330-14335
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Volume113
Issue number50
DOIs
StatePublished - Dec 13 2016

Fingerprint

Neoplasm Genes
Genes
Mutation
Mutation Rate

Keywords

  • Cancer genomics
  • Cancer mutations
  • Computational method evaluation
  • DNA sequencing
  • Driver genes

ASJC Scopus subject areas

  • General

Cite this

Evaluating the evaluation of cancer driver genes. / Tokheim, Collin J.; Papadopoulos, Nickolas; Kinzler, Kenneth W; Vogelstein, Bert; Karchin, Rachel.

In: Proceedings of the National Academy of Sciences of the United States of America, Vol. 113, No. 50, 13.12.2016, p. 14330-14335.

Research output: Contribution to journalArticle

@article{c8ddb83cfda24f51a9530b054f2e42c7,
title = "Evaluating the evaluation of cancer driver genes",
abstract = "Sequencing has identified millions of somatic mutations in human cancers, but distinguishing cancer driver genes remains a major challenge. Numerous methods have been developed to identify driver genes, but evaluation of the performance of these methods is hindered by the lack of a gold standard, that is, bona fide driver gene mutations. Here, we establish an evaluation framework that can be applied to driver gene prediction methods. We used this framework to compare the performance of eight such methods. One of these methods, described here, incorporated a machinelearning- based ratiometric approach. We show that the driver genes predicted by each of the eight methods vary widely. Moreover, the P values reported by several of the methods were inconsistent with the uniform values expected, thus calling into question the assumptions that were used to generate them. Finally, we evaluated the potential effects of unexplained variability in mutation rates on false-positive driver gene predictions. Our analysis points to the strengths and weaknesses of each of the currently available methods and offers guidance for improving them in the future.",
keywords = "Cancer genomics, Cancer mutations, Computational method evaluation, DNA sequencing, Driver genes",
author = "Tokheim, {Collin J.} and Nickolas Papadopoulos and Kinzler, {Kenneth W} and Bert Vogelstein and Rachel Karchin",
year = "2016",
month = "12",
day = "13",
doi = "10.1073/pnas.1616440113",
language = "English (US)",
volume = "113",
pages = "14330--14335",
journal = "Proceedings of the National Academy of Sciences of the United States of America",
issn = "0027-8424",
number = "50",

}

TY - JOUR

T1 - Evaluating the evaluation of cancer driver genes

AU - Tokheim, Collin J.

AU - Papadopoulos, Nickolas

AU - Kinzler, Kenneth W

AU - Vogelstein, Bert

AU - Karchin, Rachel

PY - 2016/12/13

Y1 - 2016/12/13

N2 - Sequencing has identified millions of somatic mutations in human cancers, but distinguishing cancer driver genes remains a major challenge. Numerous methods have been developed to identify driver genes, but evaluation of the performance of these methods is hindered by the lack of a gold standard, that is, bona fide driver gene mutations. Here, we establish an evaluation framework that can be applied to driver gene prediction methods. We used this framework to compare the performance of eight such methods. One of these methods, described here, incorporated a machinelearning- based ratiometric approach. We show that the driver genes predicted by each of the eight methods vary widely. Moreover, the P values reported by several of the methods were inconsistent with the uniform values expected, thus calling into question the assumptions that were used to generate them. Finally, we evaluated the potential effects of unexplained variability in mutation rates on false-positive driver gene predictions. Our analysis points to the strengths and weaknesses of each of the currently available methods and offers guidance for improving them in the future.

AB - Sequencing has identified millions of somatic mutations in human cancers, but distinguishing cancer driver genes remains a major challenge. Numerous methods have been developed to identify driver genes, but evaluation of the performance of these methods is hindered by the lack of a gold standard, that is, bona fide driver gene mutations. Here, we establish an evaluation framework that can be applied to driver gene prediction methods. We used this framework to compare the performance of eight such methods. One of these methods, described here, incorporated a machinelearning- based ratiometric approach. We show that the driver genes predicted by each of the eight methods vary widely. Moreover, the P values reported by several of the methods were inconsistent with the uniform values expected, thus calling into question the assumptions that were used to generate them. Finally, we evaluated the potential effects of unexplained variability in mutation rates on false-positive driver gene predictions. Our analysis points to the strengths and weaknesses of each of the currently available methods and offers guidance for improving them in the future.

KW - Cancer genomics

KW - Cancer mutations

KW - Computational method evaluation

KW - DNA sequencing

KW - Driver genes

UR - http://www.scopus.com/inward/record.url?scp=85005966836&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85005966836&partnerID=8YFLogxK

U2 - 10.1073/pnas.1616440113

DO - 10.1073/pnas.1616440113

M3 - Article

VL - 113

SP - 14330

EP - 14335

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

SN - 0027-8424

IS - 50

ER -