High dimensional regression on serum analytes

Yuanzhang Li; Emanuel Schwarz; Sabine Bahn; Robert Yolken; David W. Niebuhr

doi:10.2427/8672

High dimensional regression on serum analytes

Yuanzhang Li, Emanuel Schwarz, Sabine Bahn, Robert Yolken, David W. Niebuhr

School of Medicine

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Regression of high dimensional data is particularly difficult when the number of observations is limited. Principal Component Analysis, canonical correlation analysis and factor analysis are commonly used methods to reduce data dimensions, but usually cannot find the most significant linear combination. The goal is usually to find a particular partition of the space X consisting of all independent factors. In this paper, we propose an approach to high dimensional regression for applications where N>K or N<K, where N is the sample size, k is the dimension of space X. The approach starts by finding the most significant linear combination and one of the most insignificant directions to decompose the sample space into two subspaces and reduce the dimension. Further, we examine the contributions of individual variables to those most significant vectors by the coefficients of the combinations to reduce the total number of variables in the selected space without losing the power of the prediction. We use the proposed approach to determine the potential association of 51 serum analytes with schizophrenia using data derived from a case control study (n=208). Numerical results demonstrate that the proposed approach can significantly improve dimension reduction.

Original language	English (US)
Pages (from-to)	e8672.1-e8672.12
Journal	Italian Journal of Public Health
Volume	9
Issue number	4
DOIs	https://doi.org/10.2427/8672
State	Published - Dec 31 2012

Keywords

Gradient
High dimensional regression
Schizophrenia

ASJC Scopus subject areas

Epidemiology
Health Policy
Community and Home Care
Public Health, Environmental and Occupational Health

Access to Document

10.2427/8672

Cite this

@article{d38b64719a604f0bb06dd9fb575062d1,

title = "High dimensional regression on serum analytes",

abstract = "Regression of high dimensional data is particularly difficult when the number of observations is limited. Principal Component Analysis, canonical correlation analysis and factor analysis are commonly used methods to reduce data dimensions, but usually cannot find the most significant linear combination. The goal is usually to find a particular partition of the space X consisting of all independent factors. In this paper, we propose an approach to high dimensional regression for applications where N>K or N<K, where N is the sample size, k is the dimension of space X. The approach starts by finding the most significant linear combination and one of the most insignificant directions to decompose the sample space into two subspaces and reduce the dimension. Further, we examine the contributions of individual variables to those most significant vectors by the coefficients of the combinations to reduce the total number of variables in the selected space without losing the power of the prediction. We use the proposed approach to determine the potential association of 51 serum analytes with schizophrenia using data derived from a case control study (n=208). Numerical results demonstrate that the proposed approach can significantly improve dimension reduction.",

keywords = "Gradient, High dimensional regression, Schizophrenia",

author = "Yuanzhang Li and Emanuel Schwarz and Sabine Bahn and Robert Yolken and Niebuhr, {David W.}",

year = "2012",

month = dec,

day = "31",

doi = "10.2427/8672",

language = "English (US)",

volume = "9",

pages = "e8672.1--e8672.12",

journal = "Italian Journal of Public Health",

issn = "1723-7815",

publisher = "Prex",

number = "4",

}

TY - JOUR

T1 - High dimensional regression on serum analytes

AU - Li, Yuanzhang

AU - Schwarz, Emanuel

AU - Bahn, Sabine

AU - Yolken, Robert

AU - Niebuhr, David W.

PY - 2012/12/31

Y1 - 2012/12/31

N2 - Regression of high dimensional data is particularly difficult when the number of observations is limited. Principal Component Analysis, canonical correlation analysis and factor analysis are commonly used methods to reduce data dimensions, but usually cannot find the most significant linear combination. The goal is usually to find a particular partition of the space X consisting of all independent factors. In this paper, we propose an approach to high dimensional regression for applications where N>K or N<K, where N is the sample size, k is the dimension of space X. The approach starts by finding the most significant linear combination and one of the most insignificant directions to decompose the sample space into two subspaces and reduce the dimension. Further, we examine the contributions of individual variables to those most significant vectors by the coefficients of the combinations to reduce the total number of variables in the selected space without losing the power of the prediction. We use the proposed approach to determine the potential association of 51 serum analytes with schizophrenia using data derived from a case control study (n=208). Numerical results demonstrate that the proposed approach can significantly improve dimension reduction.

AB - Regression of high dimensional data is particularly difficult when the number of observations is limited. Principal Component Analysis, canonical correlation analysis and factor analysis are commonly used methods to reduce data dimensions, but usually cannot find the most significant linear combination. The goal is usually to find a particular partition of the space X consisting of all independent factors. In this paper, we propose an approach to high dimensional regression for applications where N>K or N<K, where N is the sample size, k is the dimension of space X. The approach starts by finding the most significant linear combination and one of the most insignificant directions to decompose the sample space into two subspaces and reduce the dimension. Further, we examine the contributions of individual variables to those most significant vectors by the coefficients of the combinations to reduce the total number of variables in the selected space without losing the power of the prediction. We use the proposed approach to determine the potential association of 51 serum analytes with schizophrenia using data derived from a case control study (n=208). Numerical results demonstrate that the proposed approach can significantly improve dimension reduction.

KW - Gradient

KW - High dimensional regression

KW - Schizophrenia

UR - http://www.scopus.com/inward/record.url?scp=84871593851&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84871593851&partnerID=8YFLogxK

U2 - 10.2427/8672

DO - 10.2427/8672

M3 - Article

AN - SCOPUS:84871593851

SN - 1723-7815

VL - 9

SP - e8672.1-e8672.12

JO - Italian Journal of Public Health

JF - Italian Journal of Public Health

IS - 4

ER -

High dimensional regression on serum analytes

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this