Connecting population-level AUC and latent scale-invariant R2 via Semiparametric Gaussian Copula and rank correlations

Research output: Contribution to journalArticlepeer-review

Abstract

Area Under the Curve (AUC) is arguably the most popular measure of classification accuracy. We use a semiparametric framework to introduce a latent scale-invariant R2, a novel measure of variation explained for an observed binary outcome and an observed continuous predictor, and then directly link the latent R2 to AUC. This enables a mutually consistent simultaneous use of AUC as a measure of classification accuracy and the latent R2 as a scale-invariant measure of explained variation. Specifically, we employ Semiparametric Gaussian Copula (SGC) to model a joint dependence between observed binary outcome and observed continuous predictor via the correlation of latent standard normal random variables. Under SGC, we show how, both population-level AUC and latent scale-invariant R2, defined as a squared latent correlation, can be estimated using any of the four rank statistics calculated on binary-continuous pairs: Wilcoxon rank-sum, Kendall’s Tau, Spearman’s Rho, and Quadrant rank correlations. We then focus on three implications and applications: i) we explicitly show that under SGC, the population-level AUC and the population-level latent R2 are related via a monotone function that depends on the population-level prevalence rate, ii) we propose Quadrant rank correlation as a robust semiparametric version of AUC; iii) we demonstrate how, under complex-survey designs, Wilcoxon rank sum statistics and Spearman and Quadrant rank correlations provide asymptotically consistent estimators of the population-level AUC using only single-participant survey weights. We illustrate these applications using binary outcome of five-year mortality and continuous predictors including Albumin, Systolic Blood Pressure, and accelerometry-derived measures of total volume of physical activity collected in 2003-2006 National Health and Nutrition Examination Survey (NHANES) cohorts.

Original languageEnglish (US)
JournalUnknown Journal
StatePublished - Oct 30 2019

Keywords

  • AUC
  • Classification
  • Complex Surveys
  • Copula
  • Rank statistics
  • Variance explained

ASJC Scopus subject areas

  • General

Fingerprint Dive into the research topics of 'Connecting population-level AUC and latent scale-invariant R<sup>2</sup> via Semiparametric Gaussian Copula and rank correlations'. Together they form a unique fingerprint.

Cite this