Deep supervision with shape concepts for occlusion-aware 3D object parsing

Chi Li; M. Zeeshan Zia; Quoc Huy Tran; Xiang Yu; Gregory D. Hager; Manmohan Chandraker

doi:10.1109/CVPR.2017.49

Deep supervision with shape concepts for occlusion-aware 3D object parsing

Chi Li, M. Zeeshan Zia, Quoc Huy Tran, Xiang Yu, Gregory D. Hager, Manmohan Chandraker

Whiting School of Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

32 Scopus citations

Abstract

Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply supervising its hidden layers, in order to sequentially infer intermediate concepts associated with the final task. To acquire training data in desired quantities with ground truth 3D shape and relevant concepts, we render 3D object CAD models to generate large-scale synthetic data and simulate challenging occlusion configurations between objects. We train the network only on synthetic data and demonstrate state-of-the-art performances on real image benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and IKEA for 2D and 3D keypoint localization and instance segmentation. The empirical results substantiate the utility of our deep supervision scheme by demonstrating effective transfer of knowledge from synthetic data to real images, resulting in less overfitting compared to standard end-to-end training.

Original language	English (US)
Title of host publication	Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	388-397
Number of pages	10
ISBN (Electronic)	9781538604571
DOIs	https://doi.org/10.1109/CVPR.2017.49
State	Published - Nov 6 2017
Event	30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 - Honolulu, United States Duration: Jul 21 2017 → Jul 26 2017

Publication series

Name	Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Volume	2017-January

Other

Other	30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
Country/Territory	United States
City	Honolulu
Period	7/21/17 → 7/26/17

ASJC Scopus subject areas

Software
Computer Vision and Pattern Recognition

Access to Document

10.1109/CVPR.2017.49

Cite this

Li, C., Zia, M. Z., Tran, Q. H., Yu, X., Hager, G. D., & Chandraker, M. (2017). Deep supervision with shape concepts for occlusion-aware 3D object parsing. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (pp. 388-397). (Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017; Vol. 2017-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CVPR.2017.49

Deep supervision with shape concepts for occlusion-aware 3D object parsing. / Li, Chi; Zia, M. Zeeshan; Tran, Quoc Huy et al.
Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Institute of Electrical and Electronics Engineers Inc., 2017. p. 388-397 (Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017; Vol. 2017-January).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Li, C, Zia, MZ, Tran, QH, Yu, X, Hager, GD & Chandraker, M 2017, Deep supervision with shape concepts for occlusion-aware 3D object parsing. in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, Institute of Electrical and Electronics Engineers Inc., pp. 388-397, 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, United States, 7/21/17. https://doi.org/10.1109/CVPR.2017.49

Li C, Zia MZ, Tran QH, Yu X, Hager GD, Chandraker M. Deep supervision with shape concepts for occlusion-aware 3D object parsing. In Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Institute of Electrical and Electronics Engineers Inc. 2017. p. 388-397. (Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017). doi: 10.1109/CVPR.2017.49

Li, Chi ; Zia, M. Zeeshan ; Tran, Quoc Huy et al. / Deep supervision with shape concepts for occlusion-aware 3D object parsing. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017. Institute of Electrical and Electronics Engineers Inc., 2017. pp. 388-397 (Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017).

@inproceedings{783e80f3d09848899678cdacc67c6289,

title = "Deep supervision with shape concepts for occlusion-aware 3D object parsing",

abstract = "Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply supervising its hidden layers, in order to sequentially infer intermediate concepts associated with the final task. To acquire training data in desired quantities with ground truth 3D shape and relevant concepts, we render 3D object CAD models to generate large-scale synthetic data and simulate challenging occlusion configurations between objects. We train the network only on synthetic data and demonstrate state-of-the-art performances on real image benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and IKEA for 2D and 3D keypoint localization and instance segmentation. The empirical results substantiate the utility of our deep supervision scheme by demonstrating effective transfer of knowledge from synthetic data to real images, resulting in less overfitting compared to standard end-to-end training.",

author = "Chi Li and Zia, {M. Zeeshan} and Tran, {Quoc Huy} and Xiang Yu and Hager, {Gregory D.} and Manmohan Chandraker",

note = "Funding Information: This work was part of C. Li's intern project at NEC Labs America, in Cupertino. We also acknowledge the support by NSF under Grant No. NRI-1227277. Publisher Copyright: {\textcopyright} 2017 IEEE.; 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 ; Conference date: 21-07-2017 Through 26-07-2017",

year = "2017",

month = nov,

day = "6",

doi = "10.1109/CVPR.2017.49",

language = "English (US)",

series = "Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "388--397",

booktitle = "Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017",

}

TY - GEN

T1 - Deep supervision with shape concepts for occlusion-aware 3D object parsing

AU - Li, Chi

AU - Zia, M. Zeeshan

AU - Tran, Quoc Huy

AU - Yu, Xiang

AU - Hager, Gregory D.

AU - Chandraker, Manmohan

N1 - Funding Information: This work was part of C. Li's intern project at NEC Labs America, in Cupertino. We also acknowledge the support by NSF under Grant No. NRI-1227277. Publisher Copyright: © 2017 IEEE.

PY - 2017/11/6

Y1 - 2017/11/6

N2 - Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply supervising its hidden layers, in order to sequentially infer intermediate concepts associated with the final task. To acquire training data in desired quantities with ground truth 3D shape and relevant concepts, we render 3D object CAD models to generate large-scale synthetic data and simulate challenging occlusion configurations between objects. We train the network only on synthetic data and demonstrate state-of-the-art performances on real image benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and IKEA for 2D and 3D keypoint localization and instance segmentation. The empirical results substantiate the utility of our deep supervision scheme by demonstrating effective transfer of knowledge from synthetic data to real images, resulting in less overfitting compared to standard end-to-end training.

AB - Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply supervising its hidden layers, in order to sequentially infer intermediate concepts associated with the final task. To acquire training data in desired quantities with ground truth 3D shape and relevant concepts, we render 3D object CAD models to generate large-scale synthetic data and simulate challenging occlusion configurations between objects. We train the network only on synthetic data and demonstrate state-of-the-art performances on real image benchmarks including an extended version of KITTI, PASCAL VOC, PASCAL3D+ and IKEA for 2D and 3D keypoint localization and instance segmentation. The empirical results substantiate the utility of our deep supervision scheme by demonstrating effective transfer of knowledge from synthetic data to real images, resulting in less overfitting compared to standard end-to-end training.

UR - http://www.scopus.com/inward/record.url?scp=85044316084&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044316084&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2017.49

DO - 10.1109/CVPR.2017.49

M3 - Conference contribution

AN - SCOPUS:85044316084

T3 - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

SP - 388

EP - 397

BT - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

Y2 - 21 July 2017 through 26 July 2017

ER -

Deep supervision with shape concepts for occlusion-aware 3D object parsing

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this