TY - GEN
T1 - A Unified Framework for Multi-view Multi-class Object Pose Estimation
AU - Li, Chi
AU - Bai, Jin
AU - Hager, Gregory D.
N1 - Funding Information:
Acknowledgments. This work is supported by the IARPA DIVA program and the National Science Foundation under grants IIS-127228 and IIS-1637949.
Publisher Copyright:
© 2018, Springer Nature Switzerland AG.
PY - 2018
Y1 - 2018
N2 - One[NOSPACE] [NOSPACE][SPACE]core challenge in object pose estimation is to ensure accurate and robust performance for large numbers of diverse foreground objects amidst complex background clutter. In this work, we present a scalable framework for accurately inferring six Degree-of-Freedom (6-DoF) pose for a large number of object classes from single or multiple views. To learn discriminative pose features, we integrate three new capabilities into a deep Convolutional Neural Network (CNN): an inference scheme that combines both classification and pose regression based on a uniform tessellation of the Special Euclidean group in three dimensions (SE(3)), the fusion of class priors into the training process via a tiled class map, and an additional regularization using deep supervision with an object mask. Further, an efficient multi-view framework is formulated to address single-view ambiguity. We show that this framework consistently improves the performance of the single-view network. We evaluate our method on three large-scale benchmarks: YCB-Video, JHUScene-50 and ObjectNet-3D. Our approach achieves competitive or superior performance over the current state-of-the-art methods.
AB - One[NOSPACE] [NOSPACE][SPACE]core challenge in object pose estimation is to ensure accurate and robust performance for large numbers of diverse foreground objects amidst complex background clutter. In this work, we present a scalable framework for accurately inferring six Degree-of-Freedom (6-DoF) pose for a large number of object classes from single or multiple views. To learn discriminative pose features, we integrate three new capabilities into a deep Convolutional Neural Network (CNN): an inference scheme that combines both classification and pose regression based on a uniform tessellation of the Special Euclidean group in three dimensions (SE(3)), the fusion of class priors into the training process via a tiled class map, and an additional regularization using deep supervision with an object mask. Further, an efficient multi-view framework is formulated to address single-view ambiguity. We show that this framework consistently improves the performance of the single-view network. We evaluate our method on three large-scale benchmarks: YCB-Video, JHUScene-50 and ObjectNet-3D. Our approach achieves competitive or superior performance over the current state-of-the-art methods.
KW - Deep learning
KW - Multi-view recognition
KW - Object pose estimation
UR - http://www.scopus.com/inward/record.url?scp=85055093674&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055093674&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-01270-0_16
DO - 10.1007/978-3-030-01270-0_16
M3 - Conference contribution
AN - SCOPUS:85055093674
SN - 9783030012694
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 263
EP - 281
BT - Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
A2 - Weiss, Yair
A2 - Ferrari, Vittorio
A2 - Sminchisescu, Cristian
A2 - Hebert, Martial
PB - Springer Verlag
T2 - 15th European Conference on Computer Vision, ECCV 2018
Y2 - 8 September 2018 through 14 September 2018
ER -