TY - GEN
T1 - Shapelet ensemble for multi-dimensional time series
AU - Cetin, Mustafa S.
AU - Mueen, Abdullah
AU - Calhoun, Vince D.
N1 - Publisher Copyright:
Copyright © SIAM.
PY - 2015
Y1 - 2015
N2 - Time series shapelets are small subsequences that maximally differentiate classes of time series. Since the inception of shapelets, researchers have used shapelets for various data domains including anthropology and health care, and in the process suggested many efficient techniques for shapelet discovery. However, multi-dimensional time series data poses unique challenges to shapelet discovery that are yet to be solved. We show that an ensemble of shapelet-based decision trees on individual dimensions works better than shapelets defined over multiple dimensions. Generating a shapelet ensemble for multidimensional time series is computationally expensive. Most of the existing techniques prune shapelet candidates for speed. In this paper, we propose a novel technique for shapelet discovery that evaluates remaining candidates efficiently. Our algorithm uses a multi-length approximate index for time series data to efficiently find the nearest neighbors of the candidate shapelets. We employ a simple skipping technique for additional candidate pruning and a voting based technique to improve accuracy while retaining interpretability. Not only do we find a significant speed increase, our techniques enable us to efficiently discover shapelets on datasets with multi-dimensional and long time series such as hours of brain activity recordings. We demonstrate our approach on a biomedical dataset and find significant differences between patients with schizophrenia and healthy controls.
AB - Time series shapelets are small subsequences that maximally differentiate classes of time series. Since the inception of shapelets, researchers have used shapelets for various data domains including anthropology and health care, and in the process suggested many efficient techniques for shapelet discovery. However, multi-dimensional time series data poses unique challenges to shapelet discovery that are yet to be solved. We show that an ensemble of shapelet-based decision trees on individual dimensions works better than shapelets defined over multiple dimensions. Generating a shapelet ensemble for multidimensional time series is computationally expensive. Most of the existing techniques prune shapelet candidates for speed. In this paper, we propose a novel technique for shapelet discovery that evaluates remaining candidates efficiently. Our algorithm uses a multi-length approximate index for time series data to efficiently find the nearest neighbors of the candidate shapelets. We employ a simple skipping technique for additional candidate pruning and a voting based technique to improve accuracy while retaining interpretability. Not only do we find a significant speed increase, our techniques enable us to efficiently discover shapelets on datasets with multi-dimensional and long time series such as hours of brain activity recordings. We demonstrate our approach on a biomedical dataset and find significant differences between patients with schizophrenia and healthy controls.
UR - http://www.scopus.com/inward/record.url?scp=84946794479&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84946794479&partnerID=8YFLogxK
U2 - 10.1137/1.9781611974010.35
DO - 10.1137/1.9781611974010.35
M3 - Conference contribution
AN - SCOPUS:84946794479
T3 - SIAM International Conference on Data Mining 2015, SDM 2015
SP - 307
EP - 315
BT - SIAM International Conference on Data Mining 2015, SDM 2015
A2 - Venkatasubramanian, Suresh
A2 - Ye, Jieping
PB - Society for Industrial and Applied Mathematics Publications
T2 - SIAM International Conference on Data Mining 2015, SDM 2015
Y2 - 30 April 2015 through 2 May 2015
ER -