TY - JOUR
T1 - Segmenting and classifying activities in robot-assisted surgery with recurrent neural networks
AU - DiPietro, Robert
AU - Ahmidi, Narges
AU - Malpani, Anand
AU - Waldram, Madeleine
AU - Lee, Gyusung I.
AU - Lee, Mija R.
AU - Vedula, S. Swaroop
AU - Hager, Gregory D.
N1 - Funding Information:
This research was supported by NSF Grant OISE-1065092, ?A US-Germany Research Collaboration on Systems for Computer-Integrated Healthcare,? and by a fellowship for modeling, simulation, and training from the Link Foundation (Grant No. 90078471).
Publisher Copyright:
© 2019, CARS.
PY - 2019/11/1
Y1 - 2019/11/1
N2 - Purpose: Automatically segmenting and classifying surgical activities is an important prerequisite to providing automated, targeted assessment and feedback during surgical training. Prior work has focused almost exclusively on recognizing gestures, or short, atomic units of activity such as pushing needle through tissue, whereas we also focus on recognizing higher-level maneuvers, such as suture throw. Maneuvers exhibit more complexity and variability than the gestures from which they are composed, however working at this granularity has the benefit of being consistent with existing training curricula. Methods: Prior work has focused on hidden Markov model and conditional-random-field-based methods, which typically leverage unary terms that are local in time and linear in model parameters. Because maneuvers are governed by long-term, nonlinear dynamics, we argue that the more expressive unary terms offered by recurrent neural networks (RNNs) are better suited for this task. Four RNN architectures are compared for recognizing activities from kinematics: simple RNNs, long short-term memory, gated recurrent units, and mixed history RNNs. We report performance in terms of error rate and edit distance, and we use a functional analysis-of-variance framework to assess hyperparameter sensitivity for each architecture. Results: We obtain state-of-the-art performance for both maneuver recognition from kinematics (4 maneuvers; error rate of 8.6 ± 3.4 % ; normalized edit distance of 9.3 ± 4.3 %) and gesture recognition from kinematics (10 gestures; error rate of 15.2 ± 6.0 % ; normalized edit distance of 8.4 ± 6.3 %). Conclusions: Automated maneuver recognition is feasible with RNNs, an exciting result which offers the opportunity to provide targeted assessment and feedback at a higher level of granularity. In addition, we show that multiple hyperparameters are important for achieving good performance, and our hyperparameter analysis serves to aid future work in RNN-based activity recognition.
AB - Purpose: Automatically segmenting and classifying surgical activities is an important prerequisite to providing automated, targeted assessment and feedback during surgical training. Prior work has focused almost exclusively on recognizing gestures, or short, atomic units of activity such as pushing needle through tissue, whereas we also focus on recognizing higher-level maneuvers, such as suture throw. Maneuvers exhibit more complexity and variability than the gestures from which they are composed, however working at this granularity has the benefit of being consistent with existing training curricula. Methods: Prior work has focused on hidden Markov model and conditional-random-field-based methods, which typically leverage unary terms that are local in time and linear in model parameters. Because maneuvers are governed by long-term, nonlinear dynamics, we argue that the more expressive unary terms offered by recurrent neural networks (RNNs) are better suited for this task. Four RNN architectures are compared for recognizing activities from kinematics: simple RNNs, long short-term memory, gated recurrent units, and mixed history RNNs. We report performance in terms of error rate and edit distance, and we use a functional analysis-of-variance framework to assess hyperparameter sensitivity for each architecture. Results: We obtain state-of-the-art performance for both maneuver recognition from kinematics (4 maneuvers; error rate of 8.6 ± 3.4 % ; normalized edit distance of 9.3 ± 4.3 %) and gesture recognition from kinematics (10 gestures; error rate of 15.2 ± 6.0 % ; normalized edit distance of 8.4 ± 6.3 %). Conclusions: Automated maneuver recognition is feasible with RNNs, an exciting result which offers the opportunity to provide targeted assessment and feedback at a higher level of granularity. In addition, we show that multiple hyperparameters are important for achieving good performance, and our hyperparameter analysis serves to aid future work in RNN-based activity recognition.
KW - Gesture recognition
KW - Maneuver recognition
KW - Recurrent neural networks
KW - Robot-assisted surgery
KW - Surgical activity recognition
UR - http://www.scopus.com/inward/record.url?scp=85065182735&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85065182735&partnerID=8YFLogxK
U2 - 10.1007/s11548-019-01953-x
DO - 10.1007/s11548-019-01953-x
M3 - Article
C2 - 31037493
AN - SCOPUS:85065182735
VL - 14
SP - 2005
EP - 2020
JO - Computer-Assisted Radiology and Surgery
JF - Computer-Assisted Radiology and Surgery
SN - 1861-6410
IS - 11
ER -