Correlating two continuous variables subject to detection limits in the context of mixture distributions

Haitao Chu, Lawrence Hale Moulton, Wendy J. Mack, Douglas J. Passaro, Paulo F. Barroso, Alvaro Munoz

Research output: Contribution to journalArticle

Abstract

In individuals who are infected with human immunodeficiency virus (HIV), distributions of quantitative HIV ribonucleic acid measurements may be highly left censored with an extra spike below the limit of detection LD of the assay. A two-component mixture model with the lower component entirely supported on [0, LD] is recommended to model the extra spike in univariate analysis better. Let LD1 and LD2 be the limits of detection for the two HIV viral load measurements. When estimating the correlation coefficient between two different measures of viral load obtained from each of a sample of patients, a bivariate Gaussian mixture model is recommended to model the extra spike on [0, LD-1]and [0, LD2] better when the proportion below LD is incompatible with the left-hand tail of a bivariate Gaussian distribution. When the proportion of both variables falling below LD is very large, the parameters of the lower component may not be estimable since almost all observations from the lower component are falling below LD. A partial solution is to assume that the lower component's entire support is on [0, LD1] × [0, LD2]. Maximum likelihood is used to estimate the parameters of the lower and higher components. To evaluate whether there is a lower component, we apply a Monte Carlo approach to assess the p-value of the likelihood ratio test and two information criteria: a bootstrap-based information criterion and a cross-validation-based information criterion. We provide simulation results to evaluate the performance and compare it with two ad hoc estimators and a single-component bivariate Gaussian likelihood estimator. These methods are applied to the data from a cohort study of HIV-infected men in Rio de Janeiro, Brazil, and the data from the Women's Interagency HIV oral study. These results emphasize the need for caution when estimating correlation coefficients from data with a large proportion of non-detectable values when the proportion below LD is incompatible with the left-hand tail of a bivariate Gaussian distribution.

Original languageEnglish (US)
Pages (from-to)831-845
Number of pages15
JournalJournal of the Royal Statistical Society. Series C: Applied Statistics
Volume54
Issue number5
DOIs
StatePublished - 2005

Fingerprint

Mixture of Distributions
Detection Limit
Continuous Variables
Virus
Information Criterion
Proportion
Spike
Bivariate Distribution
Correlation coefficient
Gaussian distribution
Tail
Estimator
Cohort Study
Evaluate
Gaussian Mixture Model
Likelihood Ratio Test
p-Value
Component Model
Context
Mixture of distributions

Keywords

  • Bootstrap-based information criterion
  • Correlation coefficient
  • Cross-validation-based information criterion
  • Human immunodeficiency virus
  • Left censoring
  • Likelihood ratio test
  • Mixture model
  • Model selection

ASJC Scopus subject areas

  • Mathematics(all)
  • Statistics and Probability

Cite this

Correlating two continuous variables subject to detection limits in the context of mixture distributions. / Chu, Haitao; Moulton, Lawrence Hale; Mack, Wendy J.; Passaro, Douglas J.; Barroso, Paulo F.; Munoz, Alvaro.

In: Journal of the Royal Statistical Society. Series C: Applied Statistics, Vol. 54, No. 5, 2005, p. 831-845.

Research output: Contribution to journalArticle

@article{be3ace88649347d583babb112ca284c9,
title = "Correlating two continuous variables subject to detection limits in the context of mixture distributions",
abstract = "In individuals who are infected with human immunodeficiency virus (HIV), distributions of quantitative HIV ribonucleic acid measurements may be highly left censored with an extra spike below the limit of detection LD of the assay. A two-component mixture model with the lower component entirely supported on [0, LD] is recommended to model the extra spike in univariate analysis better. Let LD1 and LD2 be the limits of detection for the two HIV viral load measurements. When estimating the correlation coefficient between two different measures of viral load obtained from each of a sample of patients, a bivariate Gaussian mixture model is recommended to model the extra spike on [0, LD-1]and [0, LD2] better when the proportion below LD is incompatible with the left-hand tail of a bivariate Gaussian distribution. When the proportion of both variables falling below LD is very large, the parameters of the lower component may not be estimable since almost all observations from the lower component are falling below LD. A partial solution is to assume that the lower component's entire support is on [0, LD1] × [0, LD2]. Maximum likelihood is used to estimate the parameters of the lower and higher components. To evaluate whether there is a lower component, we apply a Monte Carlo approach to assess the p-value of the likelihood ratio test and two information criteria: a bootstrap-based information criterion and a cross-validation-based information criterion. We provide simulation results to evaluate the performance and compare it with two ad hoc estimators and a single-component bivariate Gaussian likelihood estimator. These methods are applied to the data from a cohort study of HIV-infected men in Rio de Janeiro, Brazil, and the data from the Women's Interagency HIV oral study. These results emphasize the need for caution when estimating correlation coefficients from data with a large proportion of non-detectable values when the proportion below LD is incompatible with the left-hand tail of a bivariate Gaussian distribution.",
keywords = "Bootstrap-based information criterion, Correlation coefficient, Cross-validation-based information criterion, Human immunodeficiency virus, Left censoring, Likelihood ratio test, Mixture model, Model selection",
author = "Haitao Chu and Moulton, {Lawrence Hale} and Mack, {Wendy J.} and Passaro, {Douglas J.} and Barroso, {Paulo F.} and Alvaro Munoz",
year = "2005",
doi = "10.1111/j.1467-9876.2005.00512.x",
language = "English (US)",
volume = "54",
pages = "831--845",
journal = "Journal of the Royal Statistical Society. Series C: Applied Statistics",
issn = "0035-9254",
publisher = "Wiley-Blackwell",
number = "5",

}

TY - JOUR

T1 - Correlating two continuous variables subject to detection limits in the context of mixture distributions

AU - Chu, Haitao

AU - Moulton, Lawrence Hale

AU - Mack, Wendy J.

AU - Passaro, Douglas J.

AU - Barroso, Paulo F.

AU - Munoz, Alvaro

PY - 2005

Y1 - 2005

N2 - In individuals who are infected with human immunodeficiency virus (HIV), distributions of quantitative HIV ribonucleic acid measurements may be highly left censored with an extra spike below the limit of detection LD of the assay. A two-component mixture model with the lower component entirely supported on [0, LD] is recommended to model the extra spike in univariate analysis better. Let LD1 and LD2 be the limits of detection for the two HIV viral load measurements. When estimating the correlation coefficient between two different measures of viral load obtained from each of a sample of patients, a bivariate Gaussian mixture model is recommended to model the extra spike on [0, LD-1]and [0, LD2] better when the proportion below LD is incompatible with the left-hand tail of a bivariate Gaussian distribution. When the proportion of both variables falling below LD is very large, the parameters of the lower component may not be estimable since almost all observations from the lower component are falling below LD. A partial solution is to assume that the lower component's entire support is on [0, LD1] × [0, LD2]. Maximum likelihood is used to estimate the parameters of the lower and higher components. To evaluate whether there is a lower component, we apply a Monte Carlo approach to assess the p-value of the likelihood ratio test and two information criteria: a bootstrap-based information criterion and a cross-validation-based information criterion. We provide simulation results to evaluate the performance and compare it with two ad hoc estimators and a single-component bivariate Gaussian likelihood estimator. These methods are applied to the data from a cohort study of HIV-infected men in Rio de Janeiro, Brazil, and the data from the Women's Interagency HIV oral study. These results emphasize the need for caution when estimating correlation coefficients from data with a large proportion of non-detectable values when the proportion below LD is incompatible with the left-hand tail of a bivariate Gaussian distribution.

AB - In individuals who are infected with human immunodeficiency virus (HIV), distributions of quantitative HIV ribonucleic acid measurements may be highly left censored with an extra spike below the limit of detection LD of the assay. A two-component mixture model with the lower component entirely supported on [0, LD] is recommended to model the extra spike in univariate analysis better. Let LD1 and LD2 be the limits of detection for the two HIV viral load measurements. When estimating the correlation coefficient between two different measures of viral load obtained from each of a sample of patients, a bivariate Gaussian mixture model is recommended to model the extra spike on [0, LD-1]and [0, LD2] better when the proportion below LD is incompatible with the left-hand tail of a bivariate Gaussian distribution. When the proportion of both variables falling below LD is very large, the parameters of the lower component may not be estimable since almost all observations from the lower component are falling below LD. A partial solution is to assume that the lower component's entire support is on [0, LD1] × [0, LD2]. Maximum likelihood is used to estimate the parameters of the lower and higher components. To evaluate whether there is a lower component, we apply a Monte Carlo approach to assess the p-value of the likelihood ratio test and two information criteria: a bootstrap-based information criterion and a cross-validation-based information criterion. We provide simulation results to evaluate the performance and compare it with two ad hoc estimators and a single-component bivariate Gaussian likelihood estimator. These methods are applied to the data from a cohort study of HIV-infected men in Rio de Janeiro, Brazil, and the data from the Women's Interagency HIV oral study. These results emphasize the need for caution when estimating correlation coefficients from data with a large proportion of non-detectable values when the proportion below LD is incompatible with the left-hand tail of a bivariate Gaussian distribution.

KW - Bootstrap-based information criterion

KW - Correlation coefficient

KW - Cross-validation-based information criterion

KW - Human immunodeficiency virus

KW - Left censoring

KW - Likelihood ratio test

KW - Mixture model

KW - Model selection

UR - http://www.scopus.com/inward/record.url?scp=27344451118&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27344451118&partnerID=8YFLogxK

U2 - 10.1111/j.1467-9876.2005.00512.x

DO - 10.1111/j.1467-9876.2005.00512.x

M3 - Article

AN - SCOPUS:27344451118

VL - 54

SP - 831

EP - 845

JO - Journal of the Royal Statistical Society. Series C: Applied Statistics

JF - Journal of the Royal Statistical Society. Series C: Applied Statistics

SN - 0035-9254

IS - 5

ER -