TY - GEN
T1 - Learning convolutional action primitives for fine-grained action recognition
AU - Lea, Colin
AU - Vidal, René
AU - Hager, Gregory D.
N1 - Publisher Copyright:
© 2016 IEEE.
Copyright:
Copyright 2016 Elsevier B.V., All rights reserved.
PY - 2016/6/8
Y1 - 2016/6/8
N2 - Fine-grained action recognition is important for many applications of human-robot interaction, automated skill assessment, and surveillance. The goal is to segment and classify all actions occurring in a time series sequence. While recent recognition methods have shown strong performance in robotics applications, they often require hand-crafted features, use large amounts of domain knowledge, or employ overly simplistic representations of how objects change throughout an action. In this paper we present the Latent Convolutional Skip Chain Conditional Random Field (LC-SC-CRF). This time series model learns a set of interpretable and composable action primitives from sensor data. We apply our model to cooking tasks using accelerometer data from the University of Dundee 50 Salads dataset and to robotic surgery training tasks using robot kinematic data from the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS). Our performance on 50 Salads and JIGSAWS are 18.0% and 5.3% higher than the state of the art, respectively. This model performs well without requiring hand-crafted features or intricate domain knowledge. The code and features have been made public.
AB - Fine-grained action recognition is important for many applications of human-robot interaction, automated skill assessment, and surveillance. The goal is to segment and classify all actions occurring in a time series sequence. While recent recognition methods have shown strong performance in robotics applications, they often require hand-crafted features, use large amounts of domain knowledge, or employ overly simplistic representations of how objects change throughout an action. In this paper we present the Latent Convolutional Skip Chain Conditional Random Field (LC-SC-CRF). This time series model learns a set of interpretable and composable action primitives from sensor data. We apply our model to cooking tasks using accelerometer data from the University of Dundee 50 Salads dataset and to robotic surgery training tasks using robot kinematic data from the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS). Our performance on 50 Salads and JIGSAWS are 18.0% and 5.3% higher than the state of the art, respectively. This model performs well without requiring hand-crafted features or intricate domain knowledge. The code and features have been made public.
UR - http://www.scopus.com/inward/record.url?scp=84977479301&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84977479301&partnerID=8YFLogxK
U2 - 10.1109/ICRA.2016.7487305
DO - 10.1109/ICRA.2016.7487305
M3 - Conference contribution
AN - SCOPUS:84977479301
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 1642
EP - 1649
BT - 2016 IEEE International Conference on Robotics and Automation, ICRA 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE International Conference on Robotics and Automation, ICRA 2016
Y2 - 16 May 2016 through 21 May 2016
ER -