Ensemble random projection for multi-label classification with application to protein subcellular localization

Shibiao Wan, Man Wai Mak, Bai Zhang, Yue Wang, Sun Yuan Kung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The curse of dimensionality severely restricts the predictive power of multi-label classification systems. High-dimensional feature vectors may contain redundant or irrelevant information, causing the classification systems suffer from overfitting. To address this problem, this paper proposes a dimensionality-reduction method that applies random projection (RP) to construct an ensemble of multilabel classifiers. The merits of the proposed method are demonstrated through a multi-label protein classification task. Specifically, high-dimensional feature vectors are extracted from protein sequences using the gene ontology (GO) and Swiss-Prot databases. The feature vectors are then projected onto lower-dimensional spaces by random projection matrices whose elements conform to a distribution with zero mean and unit variance. The transformed low-dimensional vectors are classified by an ensemble of one-vs-rest multi-label support vector machine (SVM) classifiers, each corresponding to one of the RP matrices. The scores obtained from the ensemble are then fused for predicting the subcellular localization of proteins. Experimental results suggest that the proposed method can reduce the dimensions by seven folds and impressively improve the classification performance.

Original languageEnglish (US)
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5999-6003
Number of pages5
ISBN (Print)9781479928927
DOIs
StatePublished - 2014
Event2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence, Italy
Duration: May 4 2014May 9 2014

Other

Other2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
CountryItaly
CityFlorence
Period5/4/145/9/14

Fingerprint

Labels
Proteins
Classifiers
Support vector machines
Ontology
Genes

Keywords

  • Dimension reduction
  • Multi-label classification
  • Protein subcellular localization
  • Random projection
  • Support vector machines

ASJC Scopus subject areas

  • Signal Processing
  • Software
  • Electrical and Electronic Engineering

Cite this

Wan, S., Mak, M. W., Zhang, B., Wang, Y., & Kung, S. Y. (2014). Ensemble random projection for multi-label classification with application to protein subcellular localization. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings (pp. 5999-6003). [6854755] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2014.6854755

Ensemble random projection for multi-label classification with application to protein subcellular localization. / Wan, Shibiao; Mak, Man Wai; Zhang, Bai; Wang, Yue; Kung, Sun Yuan.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2014. p. 5999-6003 6854755.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wan, S, Mak, MW, Zhang, B, Wang, Y & Kung, SY 2014, Ensemble random projection for multi-label classification with application to protein subcellular localization. in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings., 6854755, Institute of Electrical and Electronics Engineers Inc., pp. 5999-6003, 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014, Florence, Italy, 5/4/14. https://doi.org/10.1109/ICASSP.2014.6854755
Wan S, Mak MW, Zhang B, Wang Y, Kung SY. Ensemble random projection for multi-label classification with application to protein subcellular localization. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2014. p. 5999-6003. 6854755 https://doi.org/10.1109/ICASSP.2014.6854755
Wan, Shibiao ; Mak, Man Wai ; Zhang, Bai ; Wang, Yue ; Kung, Sun Yuan. / Ensemble random projection for multi-label classification with application to protein subcellular localization. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 5999-6003
@inproceedings{aedcba392c1c4cb3b131c8891e2d78d0,
title = "Ensemble random projection for multi-label classification with application to protein subcellular localization",
abstract = "The curse of dimensionality severely restricts the predictive power of multi-label classification systems. High-dimensional feature vectors may contain redundant or irrelevant information, causing the classification systems suffer from overfitting. To address this problem, this paper proposes a dimensionality-reduction method that applies random projection (RP) to construct an ensemble of multilabel classifiers. The merits of the proposed method are demonstrated through a multi-label protein classification task. Specifically, high-dimensional feature vectors are extracted from protein sequences using the gene ontology (GO) and Swiss-Prot databases. The feature vectors are then projected onto lower-dimensional spaces by random projection matrices whose elements conform to a distribution with zero mean and unit variance. The transformed low-dimensional vectors are classified by an ensemble of one-vs-rest multi-label support vector machine (SVM) classifiers, each corresponding to one of the RP matrices. The scores obtained from the ensemble are then fused for predicting the subcellular localization of proteins. Experimental results suggest that the proposed method can reduce the dimensions by seven folds and impressively improve the classification performance.",
keywords = "Dimension reduction, Multi-label classification, Protein subcellular localization, Random projection, Support vector machines",
author = "Shibiao Wan and Mak, {Man Wai} and Bai Zhang and Yue Wang and Kung, {Sun Yuan}",
year = "2014",
doi = "10.1109/ICASSP.2014.6854755",
language = "English (US)",
isbn = "9781479928927",
pages = "5999--6003",
booktitle = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Ensemble random projection for multi-label classification with application to protein subcellular localization

AU - Wan, Shibiao

AU - Mak, Man Wai

AU - Zhang, Bai

AU - Wang, Yue

AU - Kung, Sun Yuan

PY - 2014

Y1 - 2014

N2 - The curse of dimensionality severely restricts the predictive power of multi-label classification systems. High-dimensional feature vectors may contain redundant or irrelevant information, causing the classification systems suffer from overfitting. To address this problem, this paper proposes a dimensionality-reduction method that applies random projection (RP) to construct an ensemble of multilabel classifiers. The merits of the proposed method are demonstrated through a multi-label protein classification task. Specifically, high-dimensional feature vectors are extracted from protein sequences using the gene ontology (GO) and Swiss-Prot databases. The feature vectors are then projected onto lower-dimensional spaces by random projection matrices whose elements conform to a distribution with zero mean and unit variance. The transformed low-dimensional vectors are classified by an ensemble of one-vs-rest multi-label support vector machine (SVM) classifiers, each corresponding to one of the RP matrices. The scores obtained from the ensemble are then fused for predicting the subcellular localization of proteins. Experimental results suggest that the proposed method can reduce the dimensions by seven folds and impressively improve the classification performance.

AB - The curse of dimensionality severely restricts the predictive power of multi-label classification systems. High-dimensional feature vectors may contain redundant or irrelevant information, causing the classification systems suffer from overfitting. To address this problem, this paper proposes a dimensionality-reduction method that applies random projection (RP) to construct an ensemble of multilabel classifiers. The merits of the proposed method are demonstrated through a multi-label protein classification task. Specifically, high-dimensional feature vectors are extracted from protein sequences using the gene ontology (GO) and Swiss-Prot databases. The feature vectors are then projected onto lower-dimensional spaces by random projection matrices whose elements conform to a distribution with zero mean and unit variance. The transformed low-dimensional vectors are classified by an ensemble of one-vs-rest multi-label support vector machine (SVM) classifiers, each corresponding to one of the RP matrices. The scores obtained from the ensemble are then fused for predicting the subcellular localization of proteins. Experimental results suggest that the proposed method can reduce the dimensions by seven folds and impressively improve the classification performance.

KW - Dimension reduction

KW - Multi-label classification

KW - Protein subcellular localization

KW - Random projection

KW - Support vector machines

UR - http://www.scopus.com/inward/record.url?scp=84905284209&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905284209&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2014.6854755

DO - 10.1109/ICASSP.2014.6854755

M3 - Conference contribution

AN - SCOPUS:84905284209

SN - 9781479928927

SP - 5999

EP - 6003

BT - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

ER -