The top-scoring 'N' algorithm: a generalized relative expression classification method from small numbers of biomolecules

Andrew T. Magis; Nathan D. Price

doi:10.1186/1471-2105-13-227

The top-scoring 'N' algorithm: a generalized relative expression classification method from small numbers of biomolecules

Andrew T. Magis, Nathan D. Price

Research output: Contribution to journal › Article › peer-review

18 Scopus citations

Abstract

Background: Relative expression algorithms such as the top-scoring pair (TSP) and the top-scoring triplet (TST) have several strengths that distinguish them from other classification methods, including resistance to overfitting, invariance to most data normalization methods, and biological interpretability. The top-scoring 'N' (TSN) algorithm is a generalized form of other relative expression algorithms which uses generic permutations and a dynamic classifier size to control both the permutation and combination space available for classification.Results: TSN was tested on nine cancer datasets, showing statistically significant differences in classification accuracy between different classifier sizes (choices of N). TSN also performed competitively against a wide variety of different classification methods, including artificial neural networks, classification trees, discriminant analysis, k-Nearest neighbor, naïve Bayes, and support vector machines, when tested on the Microarray Quality Control II datasets. Furthermore, TSN exhibits low levels of overfitting on training data compared to other methods, giving confidence that results obtained during cross validation will be more generally applicable to external validation sets.Conclusions: TSN preserves the strengths of other relative expression algorithms while allowing a much larger permutation and combination space to be explored, potentially improving classification accuracies when fewer numbers of measured features are available.

Original language	English (US)
Article number	227
Journal	BMC Bioinformatics
Volume	13
Issue number	1
DOIs	https://doi.org/10.1186/1471-2105-13-227
State	Published - Sep 11 2012
Externally published	Yes

Keywords

Classification
Cross validation
Graphics processing unit
Microarray
Relative expression
Support vector machine
Top-scoring pair

ASJC Scopus subject areas

Structural Biology
Biochemistry
Molecular Biology
Computer Science Applications
Applied Mathematics

Access to Document

10.1186/1471-2105-13-227

Cite this

@article{ced646acc213464daacf077351913919,

title = "The top-scoring 'N' algorithm: a generalized relative expression classification method from small numbers of biomolecules",

abstract = "Background: Relative expression algorithms such as the top-scoring pair (TSP) and the top-scoring triplet (TST) have several strengths that distinguish them from other classification methods, including resistance to overfitting, invariance to most data normalization methods, and biological interpretability. The top-scoring 'N' (TSN) algorithm is a generalized form of other relative expression algorithms which uses generic permutations and a dynamic classifier size to control both the permutation and combination space available for classification.Results: TSN was tested on nine cancer datasets, showing statistically significant differences in classification accuracy between different classifier sizes (choices of N). TSN also performed competitively against a wide variety of different classification methods, including artificial neural networks, classification trees, discriminant analysis, k-Nearest neighbor, na{\"i}ve Bayes, and support vector machines, when tested on the Microarray Quality Control II datasets. Furthermore, TSN exhibits low levels of overfitting on training data compared to other methods, giving confidence that results obtained during cross validation will be more generally applicable to external validation sets.Conclusions: TSN preserves the strengths of other relative expression algorithms while allowing a much larger permutation and combination space to be explored, potentially improving classification accuracies when fewer numbers of measured features are available.",

keywords = "Classification, Cross validation, Graphics processing unit, Microarray, Relative expression, Support vector machine, Top-scoring pair",

author = "Magis, {Andrew T.} and Price, {Nathan D.}",

note = "Funding Information: The authors thank Dr. Don Geman and Bahman Afsari for valuable discussions during the development of this paper. This work was supported by a National Institutes of Health Howard Temin Pathway to Independence Award in Cancer Research [R00 CA126184]; the Camille Dreyfus Teacher-Scholar Program, and the Grand Duchy of Luxembourg-ISB Systems Medicine Consortium.",

year = "2012",

month = sep,

day = "11",

doi = "10.1186/1471-2105-13-227",

language = "English (US)",

volume = "13",

journal = "BMC Bioinformatics",

issn = "1471-2105",

publisher = "BioMed Central",

number = "1",

}

TY - JOUR

T1 - The top-scoring 'N' algorithm

T2 - a generalized relative expression classification method from small numbers of biomolecules

AU - Magis, Andrew T.

AU - Price, Nathan D.

N1 - Funding Information: The authors thank Dr. Don Geman and Bahman Afsari for valuable discussions during the development of this paper. This work was supported by a National Institutes of Health Howard Temin Pathway to Independence Award in Cancer Research [R00 CA126184]; the Camille Dreyfus Teacher-Scholar Program, and the Grand Duchy of Luxembourg-ISB Systems Medicine Consortium.

PY - 2012/9/11

Y1 - 2012/9/11

N2 - Background: Relative expression algorithms such as the top-scoring pair (TSP) and the top-scoring triplet (TST) have several strengths that distinguish them from other classification methods, including resistance to overfitting, invariance to most data normalization methods, and biological interpretability. The top-scoring 'N' (TSN) algorithm is a generalized form of other relative expression algorithms which uses generic permutations and a dynamic classifier size to control both the permutation and combination space available for classification.Results: TSN was tested on nine cancer datasets, showing statistically significant differences in classification accuracy between different classifier sizes (choices of N). TSN also performed competitively against a wide variety of different classification methods, including artificial neural networks, classification trees, discriminant analysis, k-Nearest neighbor, naïve Bayes, and support vector machines, when tested on the Microarray Quality Control II datasets. Furthermore, TSN exhibits low levels of overfitting on training data compared to other methods, giving confidence that results obtained during cross validation will be more generally applicable to external validation sets.Conclusions: TSN preserves the strengths of other relative expression algorithms while allowing a much larger permutation and combination space to be explored, potentially improving classification accuracies when fewer numbers of measured features are available.

AB - Background: Relative expression algorithms such as the top-scoring pair (TSP) and the top-scoring triplet (TST) have several strengths that distinguish them from other classification methods, including resistance to overfitting, invariance to most data normalization methods, and biological interpretability. The top-scoring 'N' (TSN) algorithm is a generalized form of other relative expression algorithms which uses generic permutations and a dynamic classifier size to control both the permutation and combination space available for classification.Results: TSN was tested on nine cancer datasets, showing statistically significant differences in classification accuracy between different classifier sizes (choices of N). TSN also performed competitively against a wide variety of different classification methods, including artificial neural networks, classification trees, discriminant analysis, k-Nearest neighbor, naïve Bayes, and support vector machines, when tested on the Microarray Quality Control II datasets. Furthermore, TSN exhibits low levels of overfitting on training data compared to other methods, giving confidence that results obtained during cross validation will be more generally applicable to external validation sets.Conclusions: TSN preserves the strengths of other relative expression algorithms while allowing a much larger permutation and combination space to be explored, potentially improving classification accuracies when fewer numbers of measured features are available.

KW - Classification

KW - Cross validation

KW - Graphics processing unit

KW - Microarray

KW - Relative expression

KW - Support vector machine

KW - Top-scoring pair

UR - http://www.scopus.com/inward/record.url?scp=84865959225&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84865959225&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-13-227

DO - 10.1186/1471-2105-13-227

M3 - Article

C2 - 22966958

AN - SCOPUS:84865959225

SN - 1471-2105

VL - 13

JO - BMC Bioinformatics

JF - BMC Bioinformatics

IS - 1

M1 - 227

ER -

The top-scoring 'N' algorithm: a generalized relative expression classification method from small numbers of biomolecules

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this