The tongue is capable of producing intelligible speech because of successful orchestration of muscle groupings-i.e., functional units-of the highly complex muscles over time. Due to the different motions that tongues produce, functional units are transitional structures which transform muscle activity to surface tongue geometry and they vary significantly from one subject to another. In order to compare and contrast the location and size of functional units in the presence of such substantial inter-person variability, it is essential to study both common and subject-specific functional units in a group of people carrying out the same speech task. In this work, a new normalization technique is presented to simultaneously identify the common and subject-specific functional units defined in the tongue when tracked by tagged magnetic resonance imaging. To achieve our goal, a joint sparse non-negative matrix factorization framework is used, which learns a set of building blocks and subject-specific as well as common weighting matrices from motion quantities extracted from displacements. A spectral clustering technique is then applied to the subject-specific and common weighting matrices to determine the subject-specific functional units for each subject and the common functional units across subjects. Our experimental results using in vivo tongue motion data show that our approach is able to identify the common and subject-specific functional units with reduced size variability of tongue motion during speech.