Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus

Wei Qi Wei, Cynthia L. Leibson, Jeanine E. Ransom, Abel N. Kho, Pedro J. Caraballo, High Seng Chai, Barbara P. Yawn, Jennifer A. Pacheco, Christopher Chute

Research output: Contribution to journalArticle


Objective: To evaluate data fragmentation across healthcare centers with regard to the accuracy of a highthroughput clinical phenotyping (HTCP) algorithm developed to differentiate (1) patients with type 2 diabetes mellitus (T2DM) and (2) patients with no diabetes. Materials and methods: This population-based study identified all Olmsted County, Minnesota residents in 2007. We used provider-linked electronic medical record data from the two healthcare centers that provide >95% of all care to County residents (ie, Olmsted Medical Center and Mayo Clinic in Rochester, Minnesota, USA). Subjects were limited to residents with one or more encounter January 1, 2006 through December 31, 2007 at both healthcare centers. DM-relevant data on diagnoses, laboratory results, and medication from both centers were obtained during this period. The algorithm was first executed using data from both centers (ie, the gold standard) and then from Mayo Clinic alone. Positive predictive values and false-negative rates were calculated, and the McNemar test was used to compare categorization when data from the Mayo Clinic alone were used with the gold standard. Age and sex were compared between true-positive and false-negative subjects with T2DM. Statistical significance was accepted as p

Original languageEnglish (US)
Pages (from-to)219-224
Number of pages6
JournalJournal of the American Medical Informatics Association
Issue number2
Publication statusPublished - Mar 2012
Externally publishedYes


ASJC Scopus subject areas

  • Health Informatics

Cite this