Hierarchical semantic parsing for object pose estimation in densely cluttered scenes

Chi Li, Jonathan Bohren, Eric Carlson, Gregory Hager

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Densely cluttered scenes are composed of multiple objects which are in close contact and heavily occlude each other. Few existing 3D object recognition systems are capable of accurately predicting object poses in such scenarios. This is mainly due to the presence of objects with textureless surfaces, similar appearances and the difficulty of object instance segmentation. In this paper, we present a hierarchical semantic segmentation algorithm which partitions a densely cluttered scene into different object regions. A RANSAC-based registration method is subsequently applied to estimate 6-DoF object poses within each object class. Part of this algorithm includes a generalized pooling scheme used to construct robust and discriminative object representations from a convolutional architecture with multiple pooling domains. We also provide a new RGB-D dataset which serves as a benchmark for object pose estimation in densely cluttered scenes. This dataset contains five thousand scene frames and over twenty thousand labeled poses of ten common hand tools. We show that our method demonstrates improved performance of pose estimation on this new dataset compared with other state-of-the-art methods.

Original languageEnglish (US)
Title of host publication2016 IEEE International Conference on Robotics and Automation, ICRA 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5068-5075
Number of pages8
Volume2016-June
ISBN (Electronic)9781467380263
DOIs
StatePublished - Jun 8 2016
Event2016 IEEE International Conference on Robotics and Automation, ICRA 2016 - Stockholm, Sweden
Duration: May 16 2016May 21 2016

Other

Other2016 IEEE International Conference on Robotics and Automation, ICRA 2016
CountrySweden
CityStockholm
Period5/16/165/21/16

Fingerprint

Semantics
Hand tools
Object recognition

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence
  • Control and Systems Engineering
  • Electrical and Electronic Engineering

Cite this

Li, C., Bohren, J., Carlson, E., & Hager, G. (2016). Hierarchical semantic parsing for object pose estimation in densely cluttered scenes. In 2016 IEEE International Conference on Robotics and Automation, ICRA 2016 (Vol. 2016-June, pp. 5068-5075). [7487712] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICRA.2016.7487712

Hierarchical semantic parsing for object pose estimation in densely cluttered scenes. / Li, Chi; Bohren, Jonathan; Carlson, Eric; Hager, Gregory.

2016 IEEE International Conference on Robotics and Automation, ICRA 2016. Vol. 2016-June Institute of Electrical and Electronics Engineers Inc., 2016. p. 5068-5075 7487712.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Li, C, Bohren, J, Carlson, E & Hager, G 2016, Hierarchical semantic parsing for object pose estimation in densely cluttered scenes. in 2016 IEEE International Conference on Robotics and Automation, ICRA 2016. vol. 2016-June, 7487712, Institute of Electrical and Electronics Engineers Inc., pp. 5068-5075, 2016 IEEE International Conference on Robotics and Automation, ICRA 2016, Stockholm, Sweden, 5/16/16. https://doi.org/10.1109/ICRA.2016.7487712
Li C, Bohren J, Carlson E, Hager G. Hierarchical semantic parsing for object pose estimation in densely cluttered scenes. In 2016 IEEE International Conference on Robotics and Automation, ICRA 2016. Vol. 2016-June. Institute of Electrical and Electronics Engineers Inc. 2016. p. 5068-5075. 7487712 https://doi.org/10.1109/ICRA.2016.7487712
Li, Chi ; Bohren, Jonathan ; Carlson, Eric ; Hager, Gregory. / Hierarchical semantic parsing for object pose estimation in densely cluttered scenes. 2016 IEEE International Conference on Robotics and Automation, ICRA 2016. Vol. 2016-June Institute of Electrical and Electronics Engineers Inc., 2016. pp. 5068-5075
@inproceedings{8b757bd04f3c4b7cb838946dba4ae37e,
title = "Hierarchical semantic parsing for object pose estimation in densely cluttered scenes",
abstract = "Densely cluttered scenes are composed of multiple objects which are in close contact and heavily occlude each other. Few existing 3D object recognition systems are capable of accurately predicting object poses in such scenarios. This is mainly due to the presence of objects with textureless surfaces, similar appearances and the difficulty of object instance segmentation. In this paper, we present a hierarchical semantic segmentation algorithm which partitions a densely cluttered scene into different object regions. A RANSAC-based registration method is subsequently applied to estimate 6-DoF object poses within each object class. Part of this algorithm includes a generalized pooling scheme used to construct robust and discriminative object representations from a convolutional architecture with multiple pooling domains. We also provide a new RGB-D dataset which serves as a benchmark for object pose estimation in densely cluttered scenes. This dataset contains five thousand scene frames and over twenty thousand labeled poses of ten common hand tools. We show that our method demonstrates improved performance of pose estimation on this new dataset compared with other state-of-the-art methods.",
author = "Chi Li and Jonathan Bohren and Eric Carlson and Gregory Hager",
year = "2016",
month = "6",
day = "8",
doi = "10.1109/ICRA.2016.7487712",
language = "English (US)",
volume = "2016-June",
pages = "5068--5075",
booktitle = "2016 IEEE International Conference on Robotics and Automation, ICRA 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Hierarchical semantic parsing for object pose estimation in densely cluttered scenes

AU - Li, Chi

AU - Bohren, Jonathan

AU - Carlson, Eric

AU - Hager, Gregory

PY - 2016/6/8

Y1 - 2016/6/8

N2 - Densely cluttered scenes are composed of multiple objects which are in close contact and heavily occlude each other. Few existing 3D object recognition systems are capable of accurately predicting object poses in such scenarios. This is mainly due to the presence of objects with textureless surfaces, similar appearances and the difficulty of object instance segmentation. In this paper, we present a hierarchical semantic segmentation algorithm which partitions a densely cluttered scene into different object regions. A RANSAC-based registration method is subsequently applied to estimate 6-DoF object poses within each object class. Part of this algorithm includes a generalized pooling scheme used to construct robust and discriminative object representations from a convolutional architecture with multiple pooling domains. We also provide a new RGB-D dataset which serves as a benchmark for object pose estimation in densely cluttered scenes. This dataset contains five thousand scene frames and over twenty thousand labeled poses of ten common hand tools. We show that our method demonstrates improved performance of pose estimation on this new dataset compared with other state-of-the-art methods.

AB - Densely cluttered scenes are composed of multiple objects which are in close contact and heavily occlude each other. Few existing 3D object recognition systems are capable of accurately predicting object poses in such scenarios. This is mainly due to the presence of objects with textureless surfaces, similar appearances and the difficulty of object instance segmentation. In this paper, we present a hierarchical semantic segmentation algorithm which partitions a densely cluttered scene into different object regions. A RANSAC-based registration method is subsequently applied to estimate 6-DoF object poses within each object class. Part of this algorithm includes a generalized pooling scheme used to construct robust and discriminative object representations from a convolutional architecture with multiple pooling domains. We also provide a new RGB-D dataset which serves as a benchmark for object pose estimation in densely cluttered scenes. This dataset contains five thousand scene frames and over twenty thousand labeled poses of ten common hand tools. We show that our method demonstrates improved performance of pose estimation on this new dataset compared with other state-of-the-art methods.

UR - http://www.scopus.com/inward/record.url?scp=84977518532&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84977518532&partnerID=8YFLogxK

U2 - 10.1109/ICRA.2016.7487712

DO - 10.1109/ICRA.2016.7487712

M3 - Conference contribution

AN - SCOPUS:84977518532

VL - 2016-June

SP - 5068

EP - 5075

BT - 2016 IEEE International Conference on Robotics and Automation, ICRA 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -