Accurate interpretation and quantification of magnetic resonance imaging (MRI) is vital to medical research and clinical practice. However, lack of MRI standardization and differences in acquisition protocols often lead to measurement inconsistencies across sites. Image harmonization techniques have been shown to improve qualitative and quantitative consistency between differently acquired scans. Unfortunately, these methods typically require paired training data from traveling subjects (for supervised methods) or assumptions about anatomical similarities between the populations (for unsupervised methods). We propose a deep learning-based harmonization technique with limited supervision for use in standardization across scanners and sites. By leveraging a disentangled latent space represented by a high-resolution anatomical information component (β) and a low-dimensional contrast component (θ), the proposed method trains a cross-site harmonization model using databases of multi-modal image pairs acquired separately from each of the scanners to be harmonized. In this manuscript, we show that by using T1-weighted and T2-weighted images acquired from different subjects at three different sites, we can achieve a stable extraction of β with a continuous representation of θ. We also demonstrate that this allows cross-site harmonization without the need for paired data between sites.