An ensemble classifier with random projection for predicting multi-label protein subcellular localization

Shibiao Wan; Man Wai Mak; Bai Zhang; Yue Wang; Sun Yuan Kung

doi:10.1109/BIBM.2013.6732715

An ensemble classifier with random projection for predicting multi-label protein subcellular localization

Shibiao Wan, Man Wai Mak, Bai Zhang, Yue Wang, Sun Yuan Kung

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

7 Scopus citations

Abstract

In protein subcellular localization prediction, a predominant scenario is that the number of available features is much larger than the number of data samples. Among the large number of features, many of them may contain redundant or irrelevant information, causing the prediction systems suffer from overfitting. To address this problem, this paper proposes a dimensionality- reduction method that applies random projection (RP) to construct an ensemble multi-label classifier for predicting protein subcellular localization. Specifically, the frequencies of occurrences of gene-ontology terms are used as feature vectors, which are projected onto lower-dimensional spaces by random projection matrices whose elements conform to a distribution with zero mean and unit variance. The transformed low-dimensional vectors are classified by an ensemble of one-vs-rest multi-label support vector machine (SVM) classifiers, each corresponding to one of the RP matrices. The scores obtained from the ensemble are then fused for making the final decision. Experimental results on two recent datasets suggest that the proposed method can reduce the dimensions by six folds and remarkably improve the classification performance.

Original language	English (US)
Title of host publication	Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
Pages	35-42
Number of pages	8
DOIs	https://doi.org/10.1109/BIBM.2013.6732715
State	Published - 2013
Externally published	Yes
Event	2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013 - Shanghai, China Duration: Dec 18 2013 → Dec 21 2013

Publication series

Name	Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013

Other

Other	2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
Country/Territory	China
City	Shanghai
Period	12/18/13 → 12/21/13

Keywords

Dimension reduction
Multi-label classification
Protein subcellular localization
Random projection
Support vector machines

ASJC Scopus subject areas

Biomedical Engineering

Access to Document

10.1109/BIBM.2013.6732715

Cite this

Wan, S., Mak, M. W., Zhang, B., Wang, Y., & Kung, S. Y. (2013). An ensemble classifier with random projection for predicting multi-label protein subcellular localization. In Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013 (pp. 35-42). Article 6732715 (Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013). https://doi.org/10.1109/BIBM.2013.6732715

An ensemble classifier with random projection for predicting multi-label protein subcellular localization. / Wan, Shibiao; Mak, Man Wai; Zhang, Bai et al.
Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013. 2013. p. 35-42 6732715 (Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Wan, S, Mak, MW, Zhang, B, Wang, Y & Kung, SY 2013, An ensemble classifier with random projection for predicting multi-label protein subcellular localization. in Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013., 6732715, Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013, pp. 35-42, 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013, Shanghai, China, 12/18/13. https://doi.org/10.1109/BIBM.2013.6732715

Wan S, Mak MW, Zhang B, Wang Y, Kung SY. An ensemble classifier with random projection for predicting multi-label protein subcellular localization. In Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013. 2013. p. 35-42. 6732715. (Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013). doi: 10.1109/BIBM.2013.6732715

Wan, Shibiao ; Mak, Man Wai ; Zhang, Bai et al. / An ensemble classifier with random projection for predicting multi-label protein subcellular localization. Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013. 2013. pp. 35-42 (Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013).

@inproceedings{5bd0046d4cb84a67bbf09b032c99630f,

title = "An ensemble classifier with random projection for predicting multi-label protein subcellular localization",

abstract = "In protein subcellular localization prediction, a predominant scenario is that the number of available features is much larger than the number of data samples. Among the large number of features, many of them may contain redundant or irrelevant information, causing the prediction systems suffer from overfitting. To address this problem, this paper proposes a dimensionality- reduction method that applies random projection (RP) to construct an ensemble multi-label classifier for predicting protein subcellular localization. Specifically, the frequencies of occurrences of gene-ontology terms are used as feature vectors, which are projected onto lower-dimensional spaces by random projection matrices whose elements conform to a distribution with zero mean and unit variance. The transformed low-dimensional vectors are classified by an ensemble of one-vs-rest multi-label support vector machine (SVM) classifiers, each corresponding to one of the RP matrices. The scores obtained from the ensemble are then fused for making the final decision. Experimental results on two recent datasets suggest that the proposed method can reduce the dimensions by six folds and remarkably improve the classification performance.",

keywords = "Dimension reduction, Multi-label classification, Protein subcellular localization, Random projection, Support vector machines",

author = "Shibiao Wan and Mak, {Man Wai} and Bai Zhang and Yue Wang and Kung, {Sun Yuan}",

year = "2013",

doi = "10.1109/BIBM.2013.6732715",

language = "English (US)",

isbn = "9781479913091",

series = "Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013",

pages = "35--42",

booktitle = "Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013",

note = "2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013 ; Conference date: 18-12-2013 Through 21-12-2013",

}

TY - GEN

T1 - An ensemble classifier with random projection for predicting multi-label protein subcellular localization

AU - Wan, Shibiao

AU - Mak, Man Wai

AU - Zhang, Bai

AU - Wang, Yue

AU - Kung, Sun Yuan

PY - 2013

Y1 - 2013

N2 - In protein subcellular localization prediction, a predominant scenario is that the number of available features is much larger than the number of data samples. Among the large number of features, many of them may contain redundant or irrelevant information, causing the prediction systems suffer from overfitting. To address this problem, this paper proposes a dimensionality- reduction method that applies random projection (RP) to construct an ensemble multi-label classifier for predicting protein subcellular localization. Specifically, the frequencies of occurrences of gene-ontology terms are used as feature vectors, which are projected onto lower-dimensional spaces by random projection matrices whose elements conform to a distribution with zero mean and unit variance. The transformed low-dimensional vectors are classified by an ensemble of one-vs-rest multi-label support vector machine (SVM) classifiers, each corresponding to one of the RP matrices. The scores obtained from the ensemble are then fused for making the final decision. Experimental results on two recent datasets suggest that the proposed method can reduce the dimensions by six folds and remarkably improve the classification performance.

AB - In protein subcellular localization prediction, a predominant scenario is that the number of available features is much larger than the number of data samples. Among the large number of features, many of them may contain redundant or irrelevant information, causing the prediction systems suffer from overfitting. To address this problem, this paper proposes a dimensionality- reduction method that applies random projection (RP) to construct an ensemble multi-label classifier for predicting protein subcellular localization. Specifically, the frequencies of occurrences of gene-ontology terms are used as feature vectors, which are projected onto lower-dimensional spaces by random projection matrices whose elements conform to a distribution with zero mean and unit variance. The transformed low-dimensional vectors are classified by an ensemble of one-vs-rest multi-label support vector machine (SVM) classifiers, each corresponding to one of the RP matrices. The scores obtained from the ensemble are then fused for making the final decision. Experimental results on two recent datasets suggest that the proposed method can reduce the dimensions by six folds and remarkably improve the classification performance.

KW - Dimension reduction

KW - Multi-label classification

KW - Protein subcellular localization

KW - Random projection

KW - Support vector machines

UR - http://www.scopus.com/inward/record.url?scp=84894554386&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84894554386&partnerID=8YFLogxK

U2 - 10.1109/BIBM.2013.6732715

DO - 10.1109/BIBM.2013.6732715

M3 - Conference contribution

AN - SCOPUS:84894554386

SN - 9781479913091

T3 - Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013

SP - 35

EP - 42

BT - Proceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013

T2 - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013

Y2 - 18 December 2013 through 21 December 2013

ER -

An ensemble classifier with random projection for predicting multi-label protein subcellular localization

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this