Temporal convolutional networks: A unified approach to action segmentation

Colin Lea; René Vidal; Austin Reiter; Gregory D. Hager

doi:10.1007/978-3-319-49409-8_7

Temporal convolutional networks: A unified approach to action segmentation

Colin Lea, René Vidal, Austin Reiter, Gregory D. Hager

Whiting School of Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

110 Scopus citations

Abstract

The dominant paradigm for video-based action segmentation is composed of two steps: first, compute low-level features for each frame using Dense Trajectories or a Convolutional Neural Network to encode local spatiotemporal information, and second, input these features into a classifier such as a Recurrent Neural Network (RNN) that captures high-level temporal relationships. While often effective, this decoupling requires specifying two separate models, each with their own complexities, and prevents capturing more nuanced long-range spatiotemporal relationships. We propose a unified approach, as demonstrated by our Temporal Convolutional Network (TCN), that hierarchically captures relationships at low-, intermediate-, and high-level time-scales. Our model achieves superior or competitive performance using video or sensor data on three public action segmentation datasets and can be trained in a fraction of the time it takes to train an RNN.

Original language	English (US)
Title of host publication	Computer Vision – ECCV 2016 Workshops, Proceedings
Editors	Gang Hua, Herve Jegou
Publisher	Springer Verlag
Pages	47-54
Number of pages	8
ISBN (Print)	9783319494081
DOIs	https://doi.org/10.1007/978-3-319-49409-8_7
State	Published - 2016
Event	Computer Vision - ECCV 2016 Workshops, Proceedings - Amsterdam, Netherlands Duration: Oct 8 2016 → Oct 16 2016

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	9915 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	Computer Vision - ECCV 2016 Workshops, Proceedings
Country/Territory	Netherlands
City	Amsterdam
Period	10/8/16 → 10/16/16

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/978-3-319-49409-8_7

Cite this

Lea, C., Vidal, R., Reiter, A., & Hager, G. D. (2016). Temporal convolutional networks: A unified approach to action segmentation. In G. Hua, & H. Jegou (Eds.), Computer Vision – ECCV 2016 Workshops, Proceedings (pp. 47-54). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9915 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-49409-8_7

Temporal convolutional networks: A unified approach to action segmentation. / Lea, Colin; Vidal, René; Reiter, Austin et al.
Computer Vision – ECCV 2016 Workshops, Proceedings. ed. / Gang Hua; Herve Jegou. Springer Verlag, 2016. p. 47-54 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9915 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Lea, C, Vidal, R, Reiter, A & Hager, GD 2016, Temporal convolutional networks: A unified approach to action segmentation. in G Hua & H Jegou (eds), Computer Vision – ECCV 2016 Workshops, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9915 LNCS, Springer Verlag, pp. 47-54, Computer Vision - ECCV 2016 Workshops, Proceedings, Amsterdam, Netherlands, 10/8/16. https://doi.org/10.1007/978-3-319-49409-8_7

Lea C, Vidal R, Reiter A, Hager GD. Temporal convolutional networks: A unified approach to action segmentation. In Hua G, Jegou H, editors, Computer Vision – ECCV 2016 Workshops, Proceedings. Springer Verlag. 2016. p. 47-54. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-49409-8_7

Lea, Colin ; Vidal, René ; Reiter, Austin et al. / Temporal convolutional networks : A unified approach to action segmentation. Computer Vision – ECCV 2016 Workshops, Proceedings. editor / Gang Hua ; Herve Jegou. Springer Verlag, 2016. pp. 47-54 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{70070d9ba94c4c279b2b9b3969829ad2,

title = "Temporal convolutional networks: A unified approach to action segmentation",

abstract = "The dominant paradigm for video-based action segmentation is composed of two steps: first, compute low-level features for each frame using Dense Trajectories or a Convolutional Neural Network to encode local spatiotemporal information, and second, input these features into a classifier such as a Recurrent Neural Network (RNN) that captures high-level temporal relationships. While often effective, this decoupling requires specifying two separate models, each with their own complexities, and prevents capturing more nuanced long-range spatiotemporal relationships. We propose a unified approach, as demonstrated by our Temporal Convolutional Network (TCN), that hierarchically captures relationships at low-, intermediate-, and high-level time-scales. Our model achieves superior or competitive performance using video or sensor data on three public action segmentation datasets and can be trained in a fraction of the time it takes to train an RNN.",

author = "Colin Lea and Ren{\'e} Vidal and Austin Reiter and Hager, {Gregory D.}",

note = "Publisher Copyright: {\textcopyright} Springer International Publishing Switzerland 2016.; Computer Vision - ECCV 2016 Workshops, Proceedings ; Conference date: 08-10-2016 Through 16-10-2016",

year = "2016",

doi = "10.1007/978-3-319-49409-8_7",

language = "English (US)",

isbn = "9783319494081",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "47--54",

editor = "Gang Hua and Herve Jegou",

booktitle = "Computer Vision – ECCV 2016 Workshops, Proceedings",

}

TY - GEN

T1 - Temporal convolutional networks

T2 - Computer Vision - ECCV 2016 Workshops, Proceedings

AU - Lea, Colin

AU - Vidal, René

AU - Reiter, Austin

AU - Hager, Gregory D.

PY - 2016

Y1 - 2016

N2 - The dominant paradigm for video-based action segmentation is composed of two steps: first, compute low-level features for each frame using Dense Trajectories or a Convolutional Neural Network to encode local spatiotemporal information, and second, input these features into a classifier such as a Recurrent Neural Network (RNN) that captures high-level temporal relationships. While often effective, this decoupling requires specifying two separate models, each with their own complexities, and prevents capturing more nuanced long-range spatiotemporal relationships. We propose a unified approach, as demonstrated by our Temporal Convolutional Network (TCN), that hierarchically captures relationships at low-, intermediate-, and high-level time-scales. Our model achieves superior or competitive performance using video or sensor data on three public action segmentation datasets and can be trained in a fraction of the time it takes to train an RNN.

AB - The dominant paradigm for video-based action segmentation is composed of two steps: first, compute low-level features for each frame using Dense Trajectories or a Convolutional Neural Network to encode local spatiotemporal information, and second, input these features into a classifier such as a Recurrent Neural Network (RNN) that captures high-level temporal relationships. While often effective, this decoupling requires specifying two separate models, each with their own complexities, and prevents capturing more nuanced long-range spatiotemporal relationships. We propose a unified approach, as demonstrated by our Temporal Convolutional Network (TCN), that hierarchically captures relationships at low-, intermediate-, and high-level time-scales. Our model achieves superior or competitive performance using video or sensor data on three public action segmentation datasets and can be trained in a fraction of the time it takes to train an RNN.

UR - http://www.scopus.com/inward/record.url?scp=85005942737&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85005942737&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-49409-8_7

DO - 10.1007/978-3-319-49409-8_7

M3 - Conference contribution

AN - SCOPUS:85005942737

SN - 9783319494081

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 47

EP - 54

BT - Computer Vision – ECCV 2016 Workshops, Proceedings

A2 - Hua, Gang

A2 - Jegou, Herve

PB - Springer Verlag

Y2 - 8 October 2016 through 16 October 2016

ER -

Temporal convolutional networks: A unified approach to action segmentation

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this