Sequence analysis using logic regression

C. Kooperberg, I. Ruczinski, M. L. Leblanc, L. Hsu

Research output: Contribution to journalArticlepeer-review

100 Scopus citations

Abstract

Logic Regression is a new adaptive regression methodology that attempts to construct predictors as Boolean combinations of (binary) covariates. In this paper we use this algorithm to deal with single-nucleotide polymorphism (SNP) sequence data. The predictors that are found are interpretable as risk factors of the disease. Significance of these risk factors is assessed using techniques like cross-validation, permutation tests, and independent test sets. These model selection techniques remain valid when data is dependent, as is the case for the family data used here. In our analysis of the Genetic Analysis Workshop 12 data we identify the exact locations of mutations on gene 1 and gene 6 and a number of mutations on gene 2 that are associated with the affected status, without selecting any false positives.

Original languageEnglish (US)
Pages (from-to)S626-S631
JournalGenetic epidemiology
Volume21
Issue numberSUPPL. 1
DOIs
StatePublished - 2001
Externally publishedYes

Keywords

  • Adaptive estimation
  • Boolean combinations
  • SNP
  • Simulated annealing

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)

Fingerprint

Dive into the research topics of 'Sequence analysis using logic regression'. Together they form a unique fingerprint.

Cite this