Monte-Carlo methods for determining optimal number of significant variables. Application to mouse urinary profiles

Kanet Wongravee, Gavin R. Lloyd, John Hall, Maria E. Holmboe, Michele L. Schaefer, Randall R. Reed, Jose Trevejo, Richard G. Brereton

Research output: Contribution to journalArticle

Abstract

Three methods for variable selection are described, namely the t-statistic, Partial Least Squares Discriminant Analysis (PLS-DA) weights and regression coefficients, with the aim of determining which variables are the most significant markers for discriminating between two groups: a variable's level of significance is related to its magnitude. Monte-Carlo methods are employed to determine empirical significance of variables, by permuting randomly the class membership 5000 times to obtain null distributions, and comparing the observed statistic for each variable with the null distribution. Seven simulations consisting of 200 samples, divided equally between two classes, and 300 variables, are constructed; in one dataset there are no induced correlations between variables, in two datasets correlations are induced but there is no induced separation between the classes, and in four datasets, separation is induced by selecting 20 of the variables to be discriminators. In addition two metabolomic datasets were analysed consisting of the GCMS of urinary extracts from mice both to determine the effect of stress and to determine the effect of diet on the urinary chemosignal. It is shown that the t-statistic combined with Monte-Carlo permutations provides similar results to PLS weights. PLS regression coefficients find the least number of markers but, for the simulations, the lowest False Positives rates.

Original languageEnglish (US)
Pages (from-to)387-406
Number of pages20
JournalMetabolomics
Volume5
Issue number4
DOIs
StatePublished - Dec 1 2009

Keywords

  • GCMS
  • Monte-Carlo methods
  • Mouse urine
  • Partial Least Squares Discriminant Analysis
  • Variable selection
  • Volatiles

ASJC Scopus subject areas

  • Endocrinology, Diabetes and Metabolism
  • Biochemistry
  • Clinical Biochemistry

Fingerprint Dive into the research topics of 'Monte-Carlo methods for determining optimal number of significant variables. Application to mouse urinary profiles'. Together they form a unique fingerprint.

  • Cite this