High dimensional regression on serum analytes

Yuanzhang Li, Emanuel Schwarz, Sabine Bahn, Robert Yolken, David W. Niebuhr

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


Regression of high dimensional data is particularly difficult when the number of observations is limited. Principal Component Analysis, canonical correlation analysis and factor analysis are commonly used methods to reduce data dimensions, but usually cannot find the most significant linear combination. The goal is usually to find a particular partition of the space X consisting of all independent factors. In this paper, we propose an approach to high dimensional regression for applications where N>K or N<K, where N is the sample size, k is the dimension of space X. The approach starts by finding the most significant linear combination and one of the most insignificant directions to decompose the sample space into two subspaces and reduce the dimension. Further, we examine the contributions of individual variables to those most significant vectors by the coefficients of the combinations to reduce the total number of variables in the selected space without losing the power of the prediction. We use the proposed approach to determine the potential association of 51 serum analytes with schizophrenia using data derived from a case control study (n=208). Numerical results demonstrate that the proposed approach can significantly improve dimension reduction.

Original languageEnglish (US)
Pages (from-to)e8672.1-e8672.12
JournalItalian Journal of Public Health
Issue number4
StatePublished - Dec 31 2012


  • Gradient
  • High dimensional regression
  • Schizophrenia

ASJC Scopus subject areas

  • Epidemiology
  • Health Policy
  • Community and Home Care
  • Public Health, Environmental and Occupational Health


Dive into the research topics of 'High dimensional regression on serum analytes'. Together they form a unique fingerprint.

Cite this