TY - GEN
T1 - Incremental scene understanding on dense SLAM
AU - Li, Chi
AU - Xiao, Han
AU - Tateno, Keisuke
AU - Tombari, Federico
AU - Navab, Nassir
AU - Hager, Gregory D.
N1 - Funding Information:
This work is supported by the National Science Foundation under Grant No. NRI-1227277
Publisher Copyright:
© 2016 IEEE.
Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2016/11/28
Y1 - 2016/11/28
N2 - We present an architecture for online, incremental scene modeling which combines a SLAM-based scene understanding framework with semantic segmentation and object pose estimation. The core of this approach comprises a probabilistic inference scheme that predicts semantic labels for object hypotheses at each new frame. From these hypotheses, recognized scene structures are incrementally constructed and tracked. Semantic labels are inferred using a multi-domain convolutional architecture which operates on the image time series and which enables efficient propagation of features as well as robust model registration. To evaluate this architecture, we introduce a large-scale RGB-D dataset JHUSEQ-25 as a new benchmark for the sequence-based scene understanding in complex and densely cluttered scenes. This dataset contains 25 RGB-D video sequences with 100,000 labeled frames in total. We validate our method on this dataset and demonstrate improved performance of semantic segmentation and 6-DoF object pose estimation compared with methods based on the single view.
AB - We present an architecture for online, incremental scene modeling which combines a SLAM-based scene understanding framework with semantic segmentation and object pose estimation. The core of this approach comprises a probabilistic inference scheme that predicts semantic labels for object hypotheses at each new frame. From these hypotheses, recognized scene structures are incrementally constructed and tracked. Semantic labels are inferred using a multi-domain convolutional architecture which operates on the image time series and which enables efficient propagation of features as well as robust model registration. To evaluate this architecture, we introduce a large-scale RGB-D dataset JHUSEQ-25 as a new benchmark for the sequence-based scene understanding in complex and densely cluttered scenes. This dataset contains 25 RGB-D video sequences with 100,000 labeled frames in total. We validate our method on this dataset and demonstrate improved performance of semantic segmentation and 6-DoF object pose estimation compared with methods based on the single view.
UR - http://www.scopus.com/inward/record.url?scp=85006499472&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85006499472&partnerID=8YFLogxK
U2 - 10.1109/IROS.2016.7759111
DO - 10.1109/IROS.2016.7759111
M3 - Conference contribution
AN - SCOPUS:85006499472
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 574
EP - 581
BT - IROS 2016 - 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2016
Y2 - 9 October 2016 through 14 October 2016
ER -