A large and growing corpus of synchronized kinematic and video recordings of robot-assisted surgery has the potential to facilitate training and subtask automation. One of the challenges in segmenting such multi-modal trajectories is that demonstrations vary spatially, temporally, and contain random noise and loops (repetition until achieving the desired result). Segments of task trajectories are often less complex, less variable, and allow for easier detection of outliers. As manual segmentation can be tedious and error-prone, we propose a new segmentation method that combines hybrid dynamical systems theory and Bayesian non-parametric statistics to automatically segment demonstrations. Transition State Clustering (TSC) models demonstrations as noisy realizations of a switched linear dynamical system, and learns spatially and temporally consistent transition events across demonstrations. TSC uses a hierarchical Dirichlet Process Gaussian Mixture Model to avoid having to select the number of segments a priori. After a series of merging and pruning steps, the algorithm adaptively optimizes the number of segments. In a synthetic case study with two linear dynamical regimes, where demonstrations are corrupted with noise and temporal variations, TSC finds up to a 20% more accurate segmentation than GMM-based alternatives. On 67 recordings of surgical needle passing and suturing tasks from the JIGSAWS surgical training dataset , supplemented with manually annotated visual features, TSC finds 83% of needle passing segments and 73% of the suturing segments found by human experts. Qualitatively, TSC also identifies transitions overlooked by human annotators.