Shapelet ensemble for multi-dimensional time series

Mustafa S. Cetin, Abdullah Mueen, Vince Daniel Calhoun

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Time series shapelets are small subsequences that maximally differentiate classes of time series. Since the inception of shapelets, researchers have used shapelets for various data domains including anthropology and health care, and in the process suggested many efficient techniques for shapelet discovery. However, multi-dimensional time series data poses unique challenges to shapelet discovery that are yet to be solved. We show that an ensemble of shapelet-based decision trees on individual dimensions works better than shapelets defined over multiple dimensions. Generating a shapelet ensemble for multidimensional time series is computationally expensive. Most of the existing techniques prune shapelet candidates for speed. In this paper, we propose a novel technique for shapelet discovery that evaluates remaining candidates efficiently. Our algorithm uses a multi-length approximate index for time series data to efficiently find the nearest neighbors of the candidate shapelets. We employ a simple skipping technique for additional candidate pruning and a voting based technique to improve accuracy while retaining interpretability. Not only do we find a significant speed increase, our techniques enable us to efficiently discover shapelets on datasets with multi-dimensional and long time series such as hours of brain activity recordings. We demonstrate our approach on a biomedical dataset and find significant differences between patients with schizophrenia and healthy controls.

Original languageEnglish (US)
Title of host publicationSIAM International Conference on Data Mining 2015, SDM 2015
PublisherSociety for Industrial and Applied Mathematics Publications
Pages307-315
Number of pages9
ISBN (Print)9781510811522
StatePublished - 2015
Externally publishedYes
EventSIAM International Conference on Data Mining 2015, SDM 2015 - Vancouver, Canada
Duration: Apr 30 2015May 2 2015

Other

OtherSIAM International Conference on Data Mining 2015, SDM 2015
CountryCanada
CityVancouver
Period4/30/155/2/15

Fingerprint

Time series
Decision trees
Health care
Brain

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Cetin, M. S., Mueen, A., & Calhoun, V. D. (2015). Shapelet ensemble for multi-dimensional time series. In SIAM International Conference on Data Mining 2015, SDM 2015 (pp. 307-315). Society for Industrial and Applied Mathematics Publications.

Shapelet ensemble for multi-dimensional time series. / Cetin, Mustafa S.; Mueen, Abdullah; Calhoun, Vince Daniel.

SIAM International Conference on Data Mining 2015, SDM 2015. Society for Industrial and Applied Mathematics Publications, 2015. p. 307-315.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cetin, MS, Mueen, A & Calhoun, VD 2015, Shapelet ensemble for multi-dimensional time series. in SIAM International Conference on Data Mining 2015, SDM 2015. Society for Industrial and Applied Mathematics Publications, pp. 307-315, SIAM International Conference on Data Mining 2015, SDM 2015, Vancouver, Canada, 4/30/15.
Cetin MS, Mueen A, Calhoun VD. Shapelet ensemble for multi-dimensional time series. In SIAM International Conference on Data Mining 2015, SDM 2015. Society for Industrial and Applied Mathematics Publications. 2015. p. 307-315
Cetin, Mustafa S. ; Mueen, Abdullah ; Calhoun, Vince Daniel. / Shapelet ensemble for multi-dimensional time series. SIAM International Conference on Data Mining 2015, SDM 2015. Society for Industrial and Applied Mathematics Publications, 2015. pp. 307-315
@inproceedings{7eafbc0594b54dc2b1b3472db9150028,
title = "Shapelet ensemble for multi-dimensional time series",
abstract = "Time series shapelets are small subsequences that maximally differentiate classes of time series. Since the inception of shapelets, researchers have used shapelets for various data domains including anthropology and health care, and in the process suggested many efficient techniques for shapelet discovery. However, multi-dimensional time series data poses unique challenges to shapelet discovery that are yet to be solved. We show that an ensemble of shapelet-based decision trees on individual dimensions works better than shapelets defined over multiple dimensions. Generating a shapelet ensemble for multidimensional time series is computationally expensive. Most of the existing techniques prune shapelet candidates for speed. In this paper, we propose a novel technique for shapelet discovery that evaluates remaining candidates efficiently. Our algorithm uses a multi-length approximate index for time series data to efficiently find the nearest neighbors of the candidate shapelets. We employ a simple skipping technique for additional candidate pruning and a voting based technique to improve accuracy while retaining interpretability. Not only do we find a significant speed increase, our techniques enable us to efficiently discover shapelets on datasets with multi-dimensional and long time series such as hours of brain activity recordings. We demonstrate our approach on a biomedical dataset and find significant differences between patients with schizophrenia and healthy controls.",
author = "Cetin, {Mustafa S.} and Abdullah Mueen and Calhoun, {Vince Daniel}",
year = "2015",
language = "English (US)",
isbn = "9781510811522",
pages = "307--315",
booktitle = "SIAM International Conference on Data Mining 2015, SDM 2015",
publisher = "Society for Industrial and Applied Mathematics Publications",

}

TY - GEN

T1 - Shapelet ensemble for multi-dimensional time series

AU - Cetin, Mustafa S.

AU - Mueen, Abdullah

AU - Calhoun, Vince Daniel

PY - 2015

Y1 - 2015

N2 - Time series shapelets are small subsequences that maximally differentiate classes of time series. Since the inception of shapelets, researchers have used shapelets for various data domains including anthropology and health care, and in the process suggested many efficient techniques for shapelet discovery. However, multi-dimensional time series data poses unique challenges to shapelet discovery that are yet to be solved. We show that an ensemble of shapelet-based decision trees on individual dimensions works better than shapelets defined over multiple dimensions. Generating a shapelet ensemble for multidimensional time series is computationally expensive. Most of the existing techniques prune shapelet candidates for speed. In this paper, we propose a novel technique for shapelet discovery that evaluates remaining candidates efficiently. Our algorithm uses a multi-length approximate index for time series data to efficiently find the nearest neighbors of the candidate shapelets. We employ a simple skipping technique for additional candidate pruning and a voting based technique to improve accuracy while retaining interpretability. Not only do we find a significant speed increase, our techniques enable us to efficiently discover shapelets on datasets with multi-dimensional and long time series such as hours of brain activity recordings. We demonstrate our approach on a biomedical dataset and find significant differences between patients with schizophrenia and healthy controls.

AB - Time series shapelets are small subsequences that maximally differentiate classes of time series. Since the inception of shapelets, researchers have used shapelets for various data domains including anthropology and health care, and in the process suggested many efficient techniques for shapelet discovery. However, multi-dimensional time series data poses unique challenges to shapelet discovery that are yet to be solved. We show that an ensemble of shapelet-based decision trees on individual dimensions works better than shapelets defined over multiple dimensions. Generating a shapelet ensemble for multidimensional time series is computationally expensive. Most of the existing techniques prune shapelet candidates for speed. In this paper, we propose a novel technique for shapelet discovery that evaluates remaining candidates efficiently. Our algorithm uses a multi-length approximate index for time series data to efficiently find the nearest neighbors of the candidate shapelets. We employ a simple skipping technique for additional candidate pruning and a voting based technique to improve accuracy while retaining interpretability. Not only do we find a significant speed increase, our techniques enable us to efficiently discover shapelets on datasets with multi-dimensional and long time series such as hours of brain activity recordings. We demonstrate our approach on a biomedical dataset and find significant differences between patients with schizophrenia and healthy controls.

UR - http://www.scopus.com/inward/record.url?scp=84946794479&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946794479&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84946794479

SN - 9781510811522

SP - 307

EP - 315

BT - SIAM International Conference on Data Mining 2015, SDM 2015

PB - Society for Industrial and Applied Mathematics Publications

ER -