An ensemble classifier with random projection for predicting multi-label protein subcellular localization

Shibiao Wan, Man Wai Mak, Bai Zhang, Yue Wang, Sun Yuan Kung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

In protein subcellular localization prediction, a predominant scenario is that the number of available features is much larger than the number of data samples. Among the large number of features, many of them may contain redundant or irrelevant information, causing the prediction systems suffer from overfitting. To address this problem, this paper proposes a dimensionality- reduction method that applies random projection (RP) to construct an ensemble multi-label classifier for predicting protein subcellular localization. Specifically, the frequencies of occurrences of gene-ontology terms are used as feature vectors, which are projected onto lower-dimensional spaces by random projection matrices whose elements conform to a distribution with zero mean and unit variance. The transformed low-dimensional vectors are classified by an ensemble of one-vs-rest multi-label support vector machine (SVM) classifiers, each corresponding to one of the RP matrices. The scores obtained from the ensemble are then fused for making the final decision. Experimental results on two recent datasets suggest that the proposed method can reduce the dimensions by six folds and remarkably improve the classification performance.

Original languageEnglish (US)
Title of host publicationProceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
Pages35-42
Number of pages8
DOIs
StatePublished - 2013
Externally publishedYes
Event2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013 - Shanghai, China
Duration: Dec 18 2013Dec 21 2013

Publication series

NameProceedings - 2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013

Other

Other2013 IEEE International Conference on Bioinformatics and Biomedicine, IEEE BIBM 2013
Country/TerritoryChina
CityShanghai
Period12/18/1312/21/13

Keywords

  • Dimension reduction
  • Multi-label classification
  • Protein subcellular localization
  • Random projection
  • Support vector machines

ASJC Scopus subject areas

  • Biomedical Engineering

Fingerprint

Dive into the research topics of 'An ensemble classifier with random projection for predicting multi-label protein subcellular localization'. Together they form a unique fingerprint.

Cite this