Segmental spatiotemporal CNNs for fine-grained action segmentation

Colin Lea, Austin Reiter, René Vidal, Gregory D. Hager

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Joint segmentation and classification of fine-grained actions is important for applications of human-robot interaction, video surveillance, and human skill evaluation. However, despite substantial recent progress in large-scale action classification, the performance of stateof- the-art fine-grained action recognition approaches remains low. We propose a model for action segmentation which combines low-level spatiotemporal features with a high-level segmental classifier. Our spatiotemporal CNN is comprised of a spatial component that represents relationships between objects and a temporal component that uses large 1D convolutional filters to capture how object relationships change across time. These features are used in tandem with a semi-Markov model that captures transitions from one action to another. We introduce an efficient constrained segmental inference algorithm for this model that is orders of magnitude faster than the current approach. We highlight the effectiveness of our Segmental Spatiotemporal CNN on cooking and surgical action datasets for which we observe substantially improved performance relative to recent baseline methods.

Original languageEnglish (US)
Title of host publicationComputer Vision - 14th European Conference, ECCV 2016, Proceedings
EditorsJiri Matas, Nicu Sebe, Max Welling, Bastian Leibe
PublisherSpringer Verlag
Pages36-52
Number of pages17
ISBN (Print)9783319464862
DOIs
StatePublished - Jan 1 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9907 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Segmental spatiotemporal CNNs for fine-grained action segmentation'. Together they form a unique fingerprint.

  • Cite this

    Lea, C., Reiter, A., Vidal, R., & Hager, G. D. (2016). Segmental spatiotemporal CNNs for fine-grained action segmentation. In J. Matas, N. Sebe, M. Welling, & B. Leibe (Eds.), Computer Vision - 14th European Conference, ECCV 2016, Proceedings (pp. 36-52). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9907 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-46487-9_3