A Unified Framework for Multi-view Multi-class Object Pose Estimation

Chi Li; Jin Bai; Gregory D. Hager

doi:10.1007/978-3-030-01270-0_16

A Unified Framework for Multi-view Multi-class Object Pose Estimation

Chi Li, Jin Bai, Gregory D. Hager

Whiting School of Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

4 Scopus citations

Abstract

One[NOSPACE] [NOSPACE][SPACE]core challenge in object pose estimation is to ensure accurate and robust performance for large numbers of diverse foreground objects amidst complex background clutter. In this work, we present a scalable framework for accurately inferring six Degree-of-Freedom (6-DoF) pose for a large number of object classes from single or multiple views. To learn discriminative pose features, we integrate three new capabilities into a deep Convolutional Neural Network (CNN): an inference scheme that combines both classification and pose regression based on a uniform tessellation of the Special Euclidean group in three dimensions (SE(3)), the fusion of class priors into the training process via a tiled class map, and an additional regularization using deep supervision with an object mask. Further, an efficient multi-view framework is formulated to address single-view ambiguity. We show that this framework consistently improves the performance of the single-view network. We evaluate our method on three large-scale benchmarks: YCB-Video, JHUScene-50 and ObjectNet-3D. Our approach achieves competitive or superior performance over the current state-of-the-art methods.

Original language	English (US)
Title of host publication	Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings
Editors	Yair Weiss, Vittorio Ferrari, Cristian Sminchisescu, Martial Hebert
Publisher	Springer Verlag
Pages	263-281
Number of pages	19
ISBN (Print)	9783030012694
DOIs	https://doi.org/10.1007/978-3-030-01270-0_16
State	Published - 2018
Event	15th European Conference on Computer Vision, ECCV 2018 - Munich, Germany Duration: Sep 8 2018 → Sep 14 2018

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	11220 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Other

Other	15th European Conference on Computer Vision, ECCV 2018
Country/Territory	Germany
City	Munich
Period	9/8/18 → 9/14/18

Keywords

Deep learning
Multi-view recognition
Object pose estimation

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/978-3-030-01270-0_16

Cite this

Li, C., Bai, J., & Hager, G. D. (2018). A Unified Framework for Multi-view Multi-class Object Pose Estimation. In Y. Weiss, V. Ferrari, C. Sminchisescu, & M. Hebert (Eds.), Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings (pp. 263-281). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11220 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-01270-0_16

A Unified Framework for Multi-view Multi-class Object Pose Estimation. / Li, Chi; Bai, Jin; Hager, Gregory D.
Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. ed. / Yair Weiss; Vittorio Ferrari; Cristian Sminchisescu; Martial Hebert. Springer Verlag, 2018. p. 263-281 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11220 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Li, C, Bai, J & Hager, GD 2018, A Unified Framework for Multi-view Multi-class Object Pose Estimation. in Y Weiss, V Ferrari, C Sminchisescu & M Hebert (eds), Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11220 LNCS, Springer Verlag, pp. 263-281, 15th European Conference on Computer Vision, ECCV 2018, Munich, Germany, 9/8/18. https://doi.org/10.1007/978-3-030-01270-0_16

Li C, Bai J, Hager GD. A Unified Framework for Multi-view Multi-class Object Pose Estimation. In Weiss Y, Ferrari V, Sminchisescu C, Hebert M, editors, Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. Springer Verlag. 2018. p. 263-281. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-01270-0_16

Li, Chi ; Bai, Jin ; Hager, Gregory D. / A Unified Framework for Multi-view Multi-class Object Pose Estimation. Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings. editor / Yair Weiss ; Vittorio Ferrari ; Cristian Sminchisescu ; Martial Hebert. Springer Verlag, 2018. pp. 263-281 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{5f1750e76ecf4b3885dfaa61e75cca36,

title = "A Unified Framework for Multi-view Multi-class Object Pose Estimation",

abstract = "One[NOSPACE] [NOSPACE][SPACE]core challenge in object pose estimation is to ensure accurate and robust performance for large numbers of diverse foreground objects amidst complex background clutter. In this work, we present a scalable framework for accurately inferring six Degree-of-Freedom (6-DoF) pose for a large number of object classes from single or multiple views. To learn discriminative pose features, we integrate three new capabilities into a deep Convolutional Neural Network (CNN): an inference scheme that combines both classification and pose regression based on a uniform tessellation of the Special Euclidean group in three dimensions (SE(3)), the fusion of class priors into the training process via a tiled class map, and an additional regularization using deep supervision with an object mask. Further, an efficient multi-view framework is formulated to address single-view ambiguity. We show that this framework consistently improves the performance of the single-view network. We evaluate our method on three large-scale benchmarks: YCB-Video, JHUScene-50 and ObjectNet-3D. Our approach achieves competitive or superior performance over the current state-of-the-art methods.",

keywords = "Deep learning, Multi-view recognition, Object pose estimation",

author = "Chi Li and Jin Bai and Hager, {Gregory D.}",

note = "Funding Information: Acknowledgments. This work is supported by the IARPA DIVA program and the National Science Foundation under grants IIS-127228 and IIS-1637949. Publisher Copyright: {\textcopyright} 2018, Springer Nature Switzerland AG.; 15th European Conference on Computer Vision, ECCV 2018 ; Conference date: 08-09-2018 Through 14-09-2018",

year = "2018",

doi = "10.1007/978-3-030-01270-0_16",

language = "English (US)",

isbn = "9783030012694",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Verlag",

pages = "263--281",

editor = "Yair Weiss and Vittorio Ferrari and Cristian Sminchisescu and Martial Hebert",

booktitle = "Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings",

}

TY - GEN

T1 - A Unified Framework for Multi-view Multi-class Object Pose Estimation

AU - Li, Chi

AU - Bai, Jin

AU - Hager, Gregory D.

N1 - Funding Information: Acknowledgments. This work is supported by the IARPA DIVA program and the National Science Foundation under grants IIS-127228 and IIS-1637949. Publisher Copyright: © 2018, Springer Nature Switzerland AG.

PY - 2018

Y1 - 2018

N2 - One[NOSPACE] [NOSPACE][SPACE]core challenge in object pose estimation is to ensure accurate and robust performance for large numbers of diverse foreground objects amidst complex background clutter. In this work, we present a scalable framework for accurately inferring six Degree-of-Freedom (6-DoF) pose for a large number of object classes from single or multiple views. To learn discriminative pose features, we integrate three new capabilities into a deep Convolutional Neural Network (CNN): an inference scheme that combines both classification and pose regression based on a uniform tessellation of the Special Euclidean group in three dimensions (SE(3)), the fusion of class priors into the training process via a tiled class map, and an additional regularization using deep supervision with an object mask. Further, an efficient multi-view framework is formulated to address single-view ambiguity. We show that this framework consistently improves the performance of the single-view network. We evaluate our method on three large-scale benchmarks: YCB-Video, JHUScene-50 and ObjectNet-3D. Our approach achieves competitive or superior performance over the current state-of-the-art methods.

AB - One[NOSPACE] [NOSPACE][SPACE]core challenge in object pose estimation is to ensure accurate and robust performance for large numbers of diverse foreground objects amidst complex background clutter. In this work, we present a scalable framework for accurately inferring six Degree-of-Freedom (6-DoF) pose for a large number of object classes from single or multiple views. To learn discriminative pose features, we integrate three new capabilities into a deep Convolutional Neural Network (CNN): an inference scheme that combines both classification and pose regression based on a uniform tessellation of the Special Euclidean group in three dimensions (SE(3)), the fusion of class priors into the training process via a tiled class map, and an additional regularization using deep supervision with an object mask. Further, an efficient multi-view framework is formulated to address single-view ambiguity. We show that this framework consistently improves the performance of the single-view network. We evaluate our method on three large-scale benchmarks: YCB-Video, JHUScene-50 and ObjectNet-3D. Our approach achieves competitive or superior performance over the current state-of-the-art methods.

KW - Deep learning

KW - Multi-view recognition

KW - Object pose estimation

UR - http://www.scopus.com/inward/record.url?scp=85055093674&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055093674&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-01270-0_16

DO - 10.1007/978-3-030-01270-0_16

M3 - Conference contribution

AN - SCOPUS:85055093674

SN - 9783030012694

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 263

EP - 281

BT - Computer Vision – ECCV 2018 - 15th European Conference, 2018, Proceedings

A2 - Weiss, Yair

A2 - Ferrari, Vittorio

A2 - Sminchisescu, Cristian

A2 - Hebert, Martial

PB - Springer Verlag

T2 - 15th European Conference on Computer Vision, ECCV 2018

Y2 - 8 September 2018 through 14 September 2018

ER -

A Unified Framework for Multi-view Multi-class Object Pose Estimation

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this