Estimation of interaction effects using pooled biospecimens in a case-control study

Michelle R. Danaher, Paul S. Albert, Aninyda Roy, Enrique F. Schisterman

Research output: Contribution to journalArticle

Abstract

Pooling, or physically mixing biospecimens, prior to evaluating biomarkers dramatically reduces biomarker evaluation cost, reduces the quantity of biospecimens required of each individual, and may reduce the percentage of laboratory measurements below the lower limit of detection. Motivated by a case-control study on miscarriage (binary outcome) and cytokines (continuous exposures), we are interested in estimating parameters in a logistic regression, where individuals with the same disease status (with or without a miscarriage) are paired and their pooled cytokine concentrations are assessed. Previous research has proposed a set-based logistic model to evaluate the relationship between a disease and pooled exposures. While the set-based logistic model is very useful for estimating main effects, it cannot estimate interactions of continuous exposures when both are measured in pools. Therefore, we propose using the expectation maximization (EM) algorithm to obtain estimators of all parameters in logistic regression model, including interactions effects. Using a simulation study, we present comparisons of efficiency under different scenarios where exposures have been measured in pools and individually. Our simulations show that randomly sampling half of the available biospecimens has less efficiency than pooling pairs of biospecimens stratified by disease status. The EM algorithm provides a method for estimating interaction effects when biospecimens have already been pooled for other reasons such as the gain in efficiency for estimating main effects demonstrated by previous research. This manuscript demonstrates that the EM algorithm offers a promising approach to estimate interaction effects of pooled biospecimens.

Original languageEnglish (US)
JournalStatistics in Medicine
DOIs
StateAccepted/In press - 2015
Externally publishedYes

Fingerprint

Case-control Study
Interaction Effects
Case-Control Studies
Expectation-maximization Algorithm
Logistic Models
Cytokines
Pooling
Logistic Model
Main Effect
Biomarkers
Spontaneous Abortion
Binary Outcomes
Logistic Regression Model
Logistic Regression
Estimate
Percentage
Research
Simulation Study
Limit of Detection
Estimator

Keywords

  • Cytokines
  • Expectation maximization
  • Logistic regression
  • Pooling designs
  • Skewed biomarkers

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability

Cite this

Estimation of interaction effects using pooled biospecimens in a case-control study. / Danaher, Michelle R.; Albert, Paul S.; Roy, Aninyda; Schisterman, Enrique F.

In: Statistics in Medicine, 2015.

Research output: Contribution to journalArticle

Danaher, Michelle R. ; Albert, Paul S. ; Roy, Aninyda ; Schisterman, Enrique F. / Estimation of interaction effects using pooled biospecimens in a case-control study. In: Statistics in Medicine. 2015.
@article{063e19d067404aa8b339e7f8f17a354b,
title = "Estimation of interaction effects using pooled biospecimens in a case-control study",
abstract = "Pooling, or physically mixing biospecimens, prior to evaluating biomarkers dramatically reduces biomarker evaluation cost, reduces the quantity of biospecimens required of each individual, and may reduce the percentage of laboratory measurements below the lower limit of detection. Motivated by a case-control study on miscarriage (binary outcome) and cytokines (continuous exposures), we are interested in estimating parameters in a logistic regression, where individuals with the same disease status (with or without a miscarriage) are paired and their pooled cytokine concentrations are assessed. Previous research has proposed a set-based logistic model to evaluate the relationship between a disease and pooled exposures. While the set-based logistic model is very useful for estimating main effects, it cannot estimate interactions of continuous exposures when both are measured in pools. Therefore, we propose using the expectation maximization (EM) algorithm to obtain estimators of all parameters in logistic regression model, including interactions effects. Using a simulation study, we present comparisons of efficiency under different scenarios where exposures have been measured in pools and individually. Our simulations show that randomly sampling half of the available biospecimens has less efficiency than pooling pairs of biospecimens stratified by disease status. The EM algorithm provides a method for estimating interaction effects when biospecimens have already been pooled for other reasons such as the gain in efficiency for estimating main effects demonstrated by previous research. This manuscript demonstrates that the EM algorithm offers a promising approach to estimate interaction effects of pooled biospecimens.",
keywords = "Cytokines, Expectation maximization, Logistic regression, Pooling designs, Skewed biomarkers",
author = "Danaher, {Michelle R.} and Albert, {Paul S.} and Aninyda Roy and Schisterman, {Enrique F.}",
year = "2015",
doi = "10.1002/sim.6798",
language = "English (US)",
journal = "Statistics in Medicine",
issn = "0277-6715",
publisher = "John Wiley and Sons Ltd",

}

TY - JOUR

T1 - Estimation of interaction effects using pooled biospecimens in a case-control study

AU - Danaher, Michelle R.

AU - Albert, Paul S.

AU - Roy, Aninyda

AU - Schisterman, Enrique F.

PY - 2015

Y1 - 2015

N2 - Pooling, or physically mixing biospecimens, prior to evaluating biomarkers dramatically reduces biomarker evaluation cost, reduces the quantity of biospecimens required of each individual, and may reduce the percentage of laboratory measurements below the lower limit of detection. Motivated by a case-control study on miscarriage (binary outcome) and cytokines (continuous exposures), we are interested in estimating parameters in a logistic regression, where individuals with the same disease status (with or without a miscarriage) are paired and their pooled cytokine concentrations are assessed. Previous research has proposed a set-based logistic model to evaluate the relationship between a disease and pooled exposures. While the set-based logistic model is very useful for estimating main effects, it cannot estimate interactions of continuous exposures when both are measured in pools. Therefore, we propose using the expectation maximization (EM) algorithm to obtain estimators of all parameters in logistic regression model, including interactions effects. Using a simulation study, we present comparisons of efficiency under different scenarios where exposures have been measured in pools and individually. Our simulations show that randomly sampling half of the available biospecimens has less efficiency than pooling pairs of biospecimens stratified by disease status. The EM algorithm provides a method for estimating interaction effects when biospecimens have already been pooled for other reasons such as the gain in efficiency for estimating main effects demonstrated by previous research. This manuscript demonstrates that the EM algorithm offers a promising approach to estimate interaction effects of pooled biospecimens.

AB - Pooling, or physically mixing biospecimens, prior to evaluating biomarkers dramatically reduces biomarker evaluation cost, reduces the quantity of biospecimens required of each individual, and may reduce the percentage of laboratory measurements below the lower limit of detection. Motivated by a case-control study on miscarriage (binary outcome) and cytokines (continuous exposures), we are interested in estimating parameters in a logistic regression, where individuals with the same disease status (with or without a miscarriage) are paired and their pooled cytokine concentrations are assessed. Previous research has proposed a set-based logistic model to evaluate the relationship between a disease and pooled exposures. While the set-based logistic model is very useful for estimating main effects, it cannot estimate interactions of continuous exposures when both are measured in pools. Therefore, we propose using the expectation maximization (EM) algorithm to obtain estimators of all parameters in logistic regression model, including interactions effects. Using a simulation study, we present comparisons of efficiency under different scenarios where exposures have been measured in pools and individually. Our simulations show that randomly sampling half of the available biospecimens has less efficiency than pooling pairs of biospecimens stratified by disease status. The EM algorithm provides a method for estimating interaction effects when biospecimens have already been pooled for other reasons such as the gain in efficiency for estimating main effects demonstrated by previous research. This manuscript demonstrates that the EM algorithm offers a promising approach to estimate interaction effects of pooled biospecimens.

KW - Cytokines

KW - Expectation maximization

KW - Logistic regression

KW - Pooling designs

KW - Skewed biomarkers

UR - http://www.scopus.com/inward/record.url?scp=84947997986&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84947997986&partnerID=8YFLogxK

U2 - 10.1002/sim.6798

DO - 10.1002/sim.6798

M3 - Article

C2 - 26553532

AN - SCOPUS:84947997986

JO - Statistics in Medicine

JF - Statistics in Medicine

SN - 0277-6715

ER -