Comparison of methods for analyzing left-censored occupational exposure data

Tran Huynh, Gurumurthy Ramachandran, Sudipto Banerjee, Joao Monteiro, Mark Stenzel, Dale P. Sandler, Lawrence S. Engel, Richard K. Kwok, Aaron Blair, Patricia A. Stewart

Research output: Contribution to journalArticle

Abstract

The National Institute for Environmental Health Sciences (NIEHS) is conducting an epidemiologic study (GuLF STUDY) to investigate the health of the workers and volunteers who participated from April to December of 2010 in the response and cleanup of the oil release after the Deepwater Horizon explosion in the Gulf of Mexico. The exposure assessment component of the study involves analyzing thousands of personal monitoring measurements that were collected during this effort. A substantial portion of these data has values reported by the analytic laboratories to be below the limits of detection (LOD). A simulation study was conducted to evaluate three established methods for analyzing data with censored observations to estimate the arithmetic mean (AM), geometric mean (GM), geometric standard deviation (GSD), and the 95th percentile (X0.95) of the exposure distribution: the maximum likelihood (ML) estimation, the β-substitution, and the Kaplan-Meier (K-M) methods. Each method was challenged with computer-generated exposure datasets drawn from lognormal and mixed lognormal distributions with sample sizes (N) varying from 5 to 100, GSDs ranging from 2 to 5, and censoring levels ranging from 10 to 90%, with single and multiple LODs. Using relative bias and relative root mean squared error (rMSE) as the evaluation metrics, the β-substitution method generally performed as well or better than the ML and K-M methods in most simulated lognormal and mixed lognormal distribution conditions. The ML method was suitable for large sample sizes (N ≥ 30) up to 80% censoring for lognormal distributions with small variability (GSD = 2-3). The K-M method generally provided accurate estimates of the AM when the censoring was <50% for lognormal and mixed distributions. The accuracy and precision of all methods decreased under high variability (GSD = 4 and 5) and small to moderate sample sizes (N < 20) but the β-substitution was still the best of the three methods. When using the ML method, practitioners are cautioned to be aware of different ways of estimating the AM as they could lead to biased interpretation. A limitation of the β-substitution method is the absence of a confidence interval for the estimate. More research is needed to develop methods that could improve the estimation accuracy for small sample sizes and high percent censored data and also provide uncertainty intervals.

Original languageEnglish (US)
Pages (from-to)1126-1142
Number of pages17
JournalAnnals of Occupational Hygiene
Volume58
Issue number9
DOIs
StatePublished - Apr 14 2014
Externally publishedYes

Fingerprint

Occupational Exposure
Sample Size
National Institute of Environmental Health Sciences (U.S.)
Gulf of Mexico
Explosions
Uncertainty
Limit of Detection
Epidemiologic Studies
Volunteers
Oils
Confidence Intervals

Keywords

  • Exposure assessment
  • left-censored data
  • the GuLF STUDY

ASJC Scopus subject areas

  • Public Health, Environmental and Occupational Health
  • Medicine(all)

Cite this

Huynh, T., Ramachandran, G., Banerjee, S., Monteiro, J., Stenzel, M., Sandler, D. P., ... Stewart, P. A. (2014). Comparison of methods for analyzing left-censored occupational exposure data. Annals of Occupational Hygiene, 58(9), 1126-1142. https://doi.org/10.1093/annhyg/meu067

Comparison of methods for analyzing left-censored occupational exposure data. / Huynh, Tran; Ramachandran, Gurumurthy; Banerjee, Sudipto; Monteiro, Joao; Stenzel, Mark; Sandler, Dale P.; Engel, Lawrence S.; Kwok, Richard K.; Blair, Aaron; Stewart, Patricia A.

In: Annals of Occupational Hygiene, Vol. 58, No. 9, 14.04.2014, p. 1126-1142.

Research output: Contribution to journalArticle

Huynh, T, Ramachandran, G, Banerjee, S, Monteiro, J, Stenzel, M, Sandler, DP, Engel, LS, Kwok, RK, Blair, A & Stewart, PA 2014, 'Comparison of methods for analyzing left-censored occupational exposure data', Annals of Occupational Hygiene, vol. 58, no. 9, pp. 1126-1142. https://doi.org/10.1093/annhyg/meu067
Huynh, Tran ; Ramachandran, Gurumurthy ; Banerjee, Sudipto ; Monteiro, Joao ; Stenzel, Mark ; Sandler, Dale P. ; Engel, Lawrence S. ; Kwok, Richard K. ; Blair, Aaron ; Stewart, Patricia A. / Comparison of methods for analyzing left-censored occupational exposure data. In: Annals of Occupational Hygiene. 2014 ; Vol. 58, No. 9. pp. 1126-1142.
@article{f1e11f1cbddd4eb7b1fd2357b03d922c,
title = "Comparison of methods for analyzing left-censored occupational exposure data",
abstract = "The National Institute for Environmental Health Sciences (NIEHS) is conducting an epidemiologic study (GuLF STUDY) to investigate the health of the workers and volunteers who participated from April to December of 2010 in the response and cleanup of the oil release after the Deepwater Horizon explosion in the Gulf of Mexico. The exposure assessment component of the study involves analyzing thousands of personal monitoring measurements that were collected during this effort. A substantial portion of these data has values reported by the analytic laboratories to be below the limits of detection (LOD). A simulation study was conducted to evaluate three established methods for analyzing data with censored observations to estimate the arithmetic mean (AM), geometric mean (GM), geometric standard deviation (GSD), and the 95th percentile (X0.95) of the exposure distribution: the maximum likelihood (ML) estimation, the β-substitution, and the Kaplan-Meier (K-M) methods. Each method was challenged with computer-generated exposure datasets drawn from lognormal and mixed lognormal distributions with sample sizes (N) varying from 5 to 100, GSDs ranging from 2 to 5, and censoring levels ranging from 10 to 90{\%}, with single and multiple LODs. Using relative bias and relative root mean squared error (rMSE) as the evaluation metrics, the β-substitution method generally performed as well or better than the ML and K-M methods in most simulated lognormal and mixed lognormal distribution conditions. The ML method was suitable for large sample sizes (N ≥ 30) up to 80{\%} censoring for lognormal distributions with small variability (GSD = 2-3). The K-M method generally provided accurate estimates of the AM when the censoring was <50{\%} for lognormal and mixed distributions. The accuracy and precision of all methods decreased under high variability (GSD = 4 and 5) and small to moderate sample sizes (N < 20) but the β-substitution was still the best of the three methods. When using the ML method, practitioners are cautioned to be aware of different ways of estimating the AM as they could lead to biased interpretation. A limitation of the β-substitution method is the absence of a confidence interval for the estimate. More research is needed to develop methods that could improve the estimation accuracy for small sample sizes and high percent censored data and also provide uncertainty intervals.",
keywords = "Exposure assessment, left-censored data, the GuLF STUDY",
author = "Tran Huynh and Gurumurthy Ramachandran and Sudipto Banerjee and Joao Monteiro and Mark Stenzel and Sandler, {Dale P.} and Engel, {Lawrence S.} and Kwok, {Richard K.} and Aaron Blair and Stewart, {Patricia A.}",
year = "2014",
month = "4",
day = "14",
doi = "10.1093/annhyg/meu067",
language = "English (US)",
volume = "58",
pages = "1126--1142",
journal = "Annals of Work Exposures and Health",
issn = "2398-7308",
publisher = "Oxford University Press",
number = "9",

}

TY - JOUR

T1 - Comparison of methods for analyzing left-censored occupational exposure data

AU - Huynh, Tran

AU - Ramachandran, Gurumurthy

AU - Banerjee, Sudipto

AU - Monteiro, Joao

AU - Stenzel, Mark

AU - Sandler, Dale P.

AU - Engel, Lawrence S.

AU - Kwok, Richard K.

AU - Blair, Aaron

AU - Stewart, Patricia A.

PY - 2014/4/14

Y1 - 2014/4/14

N2 - The National Institute for Environmental Health Sciences (NIEHS) is conducting an epidemiologic study (GuLF STUDY) to investigate the health of the workers and volunteers who participated from April to December of 2010 in the response and cleanup of the oil release after the Deepwater Horizon explosion in the Gulf of Mexico. The exposure assessment component of the study involves analyzing thousands of personal monitoring measurements that were collected during this effort. A substantial portion of these data has values reported by the analytic laboratories to be below the limits of detection (LOD). A simulation study was conducted to evaluate three established methods for analyzing data with censored observations to estimate the arithmetic mean (AM), geometric mean (GM), geometric standard deviation (GSD), and the 95th percentile (X0.95) of the exposure distribution: the maximum likelihood (ML) estimation, the β-substitution, and the Kaplan-Meier (K-M) methods. Each method was challenged with computer-generated exposure datasets drawn from lognormal and mixed lognormal distributions with sample sizes (N) varying from 5 to 100, GSDs ranging from 2 to 5, and censoring levels ranging from 10 to 90%, with single and multiple LODs. Using relative bias and relative root mean squared error (rMSE) as the evaluation metrics, the β-substitution method generally performed as well or better than the ML and K-M methods in most simulated lognormal and mixed lognormal distribution conditions. The ML method was suitable for large sample sizes (N ≥ 30) up to 80% censoring for lognormal distributions with small variability (GSD = 2-3). The K-M method generally provided accurate estimates of the AM when the censoring was <50% for lognormal and mixed distributions. The accuracy and precision of all methods decreased under high variability (GSD = 4 and 5) and small to moderate sample sizes (N < 20) but the β-substitution was still the best of the three methods. When using the ML method, practitioners are cautioned to be aware of different ways of estimating the AM as they could lead to biased interpretation. A limitation of the β-substitution method is the absence of a confidence interval for the estimate. More research is needed to develop methods that could improve the estimation accuracy for small sample sizes and high percent censored data and also provide uncertainty intervals.

AB - The National Institute for Environmental Health Sciences (NIEHS) is conducting an epidemiologic study (GuLF STUDY) to investigate the health of the workers and volunteers who participated from April to December of 2010 in the response and cleanup of the oil release after the Deepwater Horizon explosion in the Gulf of Mexico. The exposure assessment component of the study involves analyzing thousands of personal monitoring measurements that were collected during this effort. A substantial portion of these data has values reported by the analytic laboratories to be below the limits of detection (LOD). A simulation study was conducted to evaluate three established methods for analyzing data with censored observations to estimate the arithmetic mean (AM), geometric mean (GM), geometric standard deviation (GSD), and the 95th percentile (X0.95) of the exposure distribution: the maximum likelihood (ML) estimation, the β-substitution, and the Kaplan-Meier (K-M) methods. Each method was challenged with computer-generated exposure datasets drawn from lognormal and mixed lognormal distributions with sample sizes (N) varying from 5 to 100, GSDs ranging from 2 to 5, and censoring levels ranging from 10 to 90%, with single and multiple LODs. Using relative bias and relative root mean squared error (rMSE) as the evaluation metrics, the β-substitution method generally performed as well or better than the ML and K-M methods in most simulated lognormal and mixed lognormal distribution conditions. The ML method was suitable for large sample sizes (N ≥ 30) up to 80% censoring for lognormal distributions with small variability (GSD = 2-3). The K-M method generally provided accurate estimates of the AM when the censoring was <50% for lognormal and mixed distributions. The accuracy and precision of all methods decreased under high variability (GSD = 4 and 5) and small to moderate sample sizes (N < 20) but the β-substitution was still the best of the three methods. When using the ML method, practitioners are cautioned to be aware of different ways of estimating the AM as they could lead to biased interpretation. A limitation of the β-substitution method is the absence of a confidence interval for the estimate. More research is needed to develop methods that could improve the estimation accuracy for small sample sizes and high percent censored data and also provide uncertainty intervals.

KW - Exposure assessment

KW - left-censored data

KW - the GuLF STUDY

UR - http://www.scopus.com/inward/record.url?scp=84913604501&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84913604501&partnerID=8YFLogxK

U2 - 10.1093/annhyg/meu067

DO - 10.1093/annhyg/meu067

M3 - Article

VL - 58

SP - 1126

EP - 1142

JO - Annals of Work Exposures and Health

JF - Annals of Work Exposures and Health

SN - 2398-7308

IS - 9

ER -