TY - JOUR
T1 - Decentralized Distribution-sampled Classification Models with Application to Brain Imaging
AU - Lewis, Noah
AU - Gazula, Harshvardhan
AU - Plis, Sergey M.
AU - Calhoun, Vince D.
N1 - Publisher Copyright:
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2019/3/13
Y1 - 2019/3/13
N2 - background In this age of big data, large data stores allow researchers to compose robust models that are accurate and informative. In many cases, the data are stored in separate locations requiring data transfer between local sites, which can cause various practical hurdles, such as privacy concerns or heavy network load. This is especially true for medical imaging data, which can be constrained due to the health insurance portability and accountability act (HIPAA). Medical imaging datasets can also contain many thousands or millions of features, requiring heavy network load. New Method Our research expands upon current decentralized classification research by implementing a new singleshot method for both neural networks and support vector machines. Our approach is to estimate the statistical distribution of the data at each local site and pass this information to the other local sites where each site resamples from the individual distributions and trains a model on both locally available data and the resampled data. Results We show applications of our approach to handwritten digit classification as well as to multi-subject classification of brain imaging data collected from patients with schizophrenia and healthy controls. Overall, the results showed comparable classification accuracy to the centralized model with lower network load than multishot methods. Comparison with Existing Methods Many decentralized classifiers are multishot, requiring heavy network traffic. Our model attempts to alleviate this load while preserving prediction accuracy. Conclusions We show that our proposed approach performs comparably to a centralized approach while minimizing network traffic compared to multishot methods. Highlights A novel yet simple approach to decentralized classificationReduces total network load compared to current multishot algorithmsMaintains a prediction accuracy comparable to the centralized approachA novel yet simple approach to decentralized classificationReduces total network load compared to current multishot algorithmsMaintains a prediction accuracy comparable to the centralized approach
AB - background In this age of big data, large data stores allow researchers to compose robust models that are accurate and informative. In many cases, the data are stored in separate locations requiring data transfer between local sites, which can cause various practical hurdles, such as privacy concerns or heavy network load. This is especially true for medical imaging data, which can be constrained due to the health insurance portability and accountability act (HIPAA). Medical imaging datasets can also contain many thousands or millions of features, requiring heavy network load. New Method Our research expands upon current decentralized classification research by implementing a new singleshot method for both neural networks and support vector machines. Our approach is to estimate the statistical distribution of the data at each local site and pass this information to the other local sites where each site resamples from the individual distributions and trains a model on both locally available data and the resampled data. Results We show applications of our approach to handwritten digit classification as well as to multi-subject classification of brain imaging data collected from patients with schizophrenia and healthy controls. Overall, the results showed comparable classification accuracy to the centralized model with lower network load than multishot methods. Comparison with Existing Methods Many decentralized classifiers are multishot, requiring heavy network traffic. Our model attempts to alleviate this load while preserving prediction accuracy. Conclusions We show that our proposed approach performs comparably to a centralized approach while minimizing network traffic compared to multishot methods. Highlights A novel yet simple approach to decentralized classificationReduces total network load compared to current multishot algorithmsMaintains a prediction accuracy comparable to the centralized approachA novel yet simple approach to decentralized classificationReduces total network load compared to current multishot algorithmsMaintains a prediction accuracy comparable to the centralized approach
UR - http://www.scopus.com/inward/record.url?scp=85094093153&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094093153&partnerID=8YFLogxK
U2 - 10.1101/576108
DO - 10.1101/576108
M3 - Article
AN - SCOPUS:85094093153
JO - Advances in Water Resources
JF - Advances in Water Resources
SN - 0309-1708
ER -