In this paper, we exploit robust depth information with simple color-shape appearance model on single object tracking in crowd dynamic scenes. Since binocular video streams are captured from a moving camera rig, background subtraction cannot provide a reliable enhancement of region of interest. Our main contribution is a novel tracking strategy to employ explicit stereo depth to track and segment object in crowd dynamic scenes with occlusion handling. Appearance cues including color and shape play a secondary role to further extract the foreground acquired by previous depth-based segmentation. The proposed depth-driven tracking approach can largely alleviate the drifting issue, especially when the object frequently interacts with similar background in long sequence tracking. The problems caused by rapid object appearance change can also be avoided due to the stability of the depth cue. Furthermore, we propose a new, yet simple and effective depth-based scheme to cope with complete occlusion in tracking. From experiments on a large collection of challenging outdoor and indoor sequences, our algorithm demonstrates accurate and reliable tracking performance which outperforms other state-of-the-art competing algorithms.