We examine the problem of activity recognition in video using simulated data for training. In contrast to the expensive task of obtaining accurate labels from real data, synthetic data creation is not only fast and scalable, but provides ground-truth labels for more than just the activities of interest, including segmentation masks, 3D object keypoints, and more. We aim to successfully transfer a model trained on synthetic data to work on video in the real world. In this work, we provide a method of transferring from synthetic to real at intermediate representations of a video. We wish to perform activity recognition from the low-dimensional latent representation of a scene as a collection of visual attributes. As the ground-truth data does not exist in the ActEV dataset for attributes of interest, specifically orientation of cars in the ground-plane with respect to the camera, we synthesize this data. We show how we can successfully transfer a car orientation classifier, and use its predictions in our defined set of visual attributes to classify actions in video.