Random bits forest: A strong classifier/regressor for big data

Yi Wang; Yi Li; Weilin Pu; Kathryn Wen; Yin Yao Shugart; Momiao Xiong; Li Jin

doi:10.1038/srep30086

Random bits forest: A strong classifier/regressor for big data

Yi Wang, Yi Li, Weilin Pu, Kathryn Wen, Yin Yao Shugart, Momiao Xiong, Li Jin

Research output: Contribution to journal › Article › peer-review

14 Scopus citations

Abstract

Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ∼10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS).

Original language	English (US)
Article number	30086
Journal	Scientific Reports
Volume	6
DOIs	https://doi.org/10.1038/srep30086
State	Published - Jul 22 2016
Externally published	Yes

ASJC Scopus subject areas

General

Access to Document

10.1038/srep30086

Cite this

@article{7cf3246648a4484aa1497b4a375bc621,

title = "Random bits forest: A strong classifier/regressor for big data",

abstract = "Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ∼10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS).",

author = "Yi Wang and Yi Li and Weilin Pu and Kathryn Wen and Shugart, {Yin Yao} and Momiao Xiong and Li Jin",

year = "2016",

month = jul,

day = "22",

doi = "10.1038/srep30086",

language = "English (US)",

volume = "6",

journal = "Scientific Reports",

issn = "2045-2322",

publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - Random bits forest

T2 - A strong classifier/regressor for big data

AU - Wang, Yi

AU - Li, Yi

AU - Pu, Weilin

AU - Wen, Kathryn

AU - Shugart, Yin Yao

AU - Xiong, Momiao

AU - Jin, Li

PY - 2016/7/22

Y1 - 2016/7/22

N2 - Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ∼10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS).

AB - Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ∼10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS).

UR - http://www.scopus.com/inward/record.url?scp=84979530337&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979530337&partnerID=8YFLogxK

U2 - 10.1038/srep30086

DO - 10.1038/srep30086

M3 - Article

C2 - 27444562

AN - SCOPUS:84979530337

SN - 2045-2322

VL - 6

JO - Scientific Reports

JF - Scientific Reports

M1 - 30086

ER -

Random bits forest: A strong classifier/regressor for big data

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this