TY - GEN
T1 - Synthesizing attributes with unreal engine for fine-grained activity analysis
AU - Kim, Tae Soo
AU - Peven, Mike
AU - Qiu, Weichao
AU - Yuille, Alan
AU - Hager, Gregory D.
N1 - Funding Information:
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/Interior Business Center (DOI/IBC) contract number D17PC00345. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/2/8
Y1 - 2019/2/8
N2 - We examine the problem of activity recognition in video using simulated data for training. In contrast to the expensive task of obtaining accurate labels from real data, synthetic data creation is not only fast and scalable, but provides ground-truth labels for more than just the activities of interest, including segmentation masks, 3D object keypoints, and more. We aim to successfully transfer a model trained on synthetic data to work on video in the real world. In this work, we provide a method of transferring from synthetic to real at intermediate representations of a video. We wish to perform activity recognition from the low-dimensional latent representation of a scene as a collection of visual attributes. As the ground-truth data does not exist in the ActEV dataset for attributes of interest, specifically orientation of cars in the ground-plane with respect to the camera, we synthesize this data. We show how we can successfully transfer a car orientation classifier, and use its predictions in our defined set of visual attributes to classify actions in video.
AB - We examine the problem of activity recognition in video using simulated data for training. In contrast to the expensive task of obtaining accurate labels from real data, synthetic data creation is not only fast and scalable, but provides ground-truth labels for more than just the activities of interest, including segmentation masks, 3D object keypoints, and more. We aim to successfully transfer a model trained on synthetic data to work on video in the real world. In this work, we provide a method of transferring from synthetic to real at intermediate representations of a video. We wish to perform activity recognition from the low-dimensional latent representation of a scene as a collection of visual attributes. As the ground-truth data does not exist in the ActEV dataset for attributes of interest, specifically orientation of cars in the ground-plane with respect to the camera, we synthesize this data. We show how we can successfully transfer a car orientation classifier, and use its predictions in our defined set of visual attributes to classify actions in video.
UR - http://www.scopus.com/inward/record.url?scp=85063060331&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063060331&partnerID=8YFLogxK
U2 - 10.1109/WACVW.2019.00013
DO - 10.1109/WACVW.2019.00013
M3 - Conference contribution
AN - SCOPUS:85063060331
T3 - Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision Workshops, WACVW 2019
SP - 35
EP - 37
BT - Proceedings - 2019 IEEE Winter Conference on Applications of Computer Vision Workshops, WACVW 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 19th IEEE Winter Conference on Applications of Computer Vision Workshops, WACVW 2019
Y2 - 7 January 2019 through 11 January 2019
ER -