GenesΩork

An efficient algorithm for pattern discovery and multivariate feature selection in gene expression data

Jorge Lepre, Jeremy Rice, Yuhai Tu, Gustavo Stolovitzky

Research output: Contribution to journalArticle

Abstract

Motivation: Despite the growing literature devoted to finding differentially expressed genes in assays probing different tissues types, little attention has been paid to the combinatorial nature of feature selection inherent to large, high-dimensional gene expression clatasets. New flexible data analysis approaches capable of searching relevant subgroups of genes and experiments are needed to understand multivariate associations of gene expression patterns with observed phenotypes. Results: We present in detail a deterministic algorithm to discover patterns of multivariate gene associations in gene expression data. The patterns discovered are differential with respect to a control dataset. The algorithm is exhaustive and efficient, reporting all existent patterns that fit a given input parameter set while avoiding enumeration of the entire pattern space. The value of the pattern discovery approach is demonstrated by finding a set of genes that differentiate between two types of lymphoma. Moreover, these genes are found to behave consistently in an independent dataset produced in a different laboratory using different arrays, thus validating the genes selected using our algorithm. We show that the genes deemed significant in terms of their multivariate statistics will be missed using other methods.

Original languageEnglish (US)
Pages (from-to)1033-1044
Number of pages12
JournalBioinformatics
Volume20
Issue number7
DOIs
StatePublished - May 1 2004
Externally publishedYes

Fingerprint

Pattern Discovery
Gene Expression Data
Gene expression
Feature Selection
Feature extraction
Efficient Algorithms
Genes
Gene
Gene Expression
Multivariate Statistics
Deterministic Algorithm
Differentiate
Phenotype
Enumeration
Assays
Lymphoma
Data analysis
High-dimensional
Statistics
Entire

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

GenesΩork : An efficient algorithm for pattern discovery and multivariate feature selection in gene expression data. / Lepre, Jorge; Rice, Jeremy; Tu, Yuhai; Stolovitzky, Gustavo.

In: Bioinformatics, Vol. 20, No. 7, 01.05.2004, p. 1033-1044.

Research output: Contribution to journalArticle

Lepre, Jorge ; Rice, Jeremy ; Tu, Yuhai ; Stolovitzky, Gustavo. / GenesΩork : An efficient algorithm for pattern discovery and multivariate feature selection in gene expression data. In: Bioinformatics. 2004 ; Vol. 20, No. 7. pp. 1033-1044.
@article{a031f5f1163446daba21fb98505630ec,
title = "GenesΩork: An efficient algorithm for pattern discovery and multivariate feature selection in gene expression data",
abstract = "Motivation: Despite the growing literature devoted to finding differentially expressed genes in assays probing different tissues types, little attention has been paid to the combinatorial nature of feature selection inherent to large, high-dimensional gene expression clatasets. New flexible data analysis approaches capable of searching relevant subgroups of genes and experiments are needed to understand multivariate associations of gene expression patterns with observed phenotypes. Results: We present in detail a deterministic algorithm to discover patterns of multivariate gene associations in gene expression data. The patterns discovered are differential with respect to a control dataset. The algorithm is exhaustive and efficient, reporting all existent patterns that fit a given input parameter set while avoiding enumeration of the entire pattern space. The value of the pattern discovery approach is demonstrated by finding a set of genes that differentiate between two types of lymphoma. Moreover, these genes are found to behave consistently in an independent dataset produced in a different laboratory using different arrays, thus validating the genes selected using our algorithm. We show that the genes deemed significant in terms of their multivariate statistics will be missed using other methods.",
author = "Jorge Lepre and Jeremy Rice and Yuhai Tu and Gustavo Stolovitzky",
year = "2004",
month = "5",
day = "1",
doi = "10.1093/bioinformatics/bth035",
language = "English (US)",
volume = "20",
pages = "1033--1044",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "7",

}

TY - JOUR

T1 - GenesΩork

T2 - An efficient algorithm for pattern discovery and multivariate feature selection in gene expression data

AU - Lepre, Jorge

AU - Rice, Jeremy

AU - Tu, Yuhai

AU - Stolovitzky, Gustavo

PY - 2004/5/1

Y1 - 2004/5/1

N2 - Motivation: Despite the growing literature devoted to finding differentially expressed genes in assays probing different tissues types, little attention has been paid to the combinatorial nature of feature selection inherent to large, high-dimensional gene expression clatasets. New flexible data analysis approaches capable of searching relevant subgroups of genes and experiments are needed to understand multivariate associations of gene expression patterns with observed phenotypes. Results: We present in detail a deterministic algorithm to discover patterns of multivariate gene associations in gene expression data. The patterns discovered are differential with respect to a control dataset. The algorithm is exhaustive and efficient, reporting all existent patterns that fit a given input parameter set while avoiding enumeration of the entire pattern space. The value of the pattern discovery approach is demonstrated by finding a set of genes that differentiate between two types of lymphoma. Moreover, these genes are found to behave consistently in an independent dataset produced in a different laboratory using different arrays, thus validating the genes selected using our algorithm. We show that the genes deemed significant in terms of their multivariate statistics will be missed using other methods.

AB - Motivation: Despite the growing literature devoted to finding differentially expressed genes in assays probing different tissues types, little attention has been paid to the combinatorial nature of feature selection inherent to large, high-dimensional gene expression clatasets. New flexible data analysis approaches capable of searching relevant subgroups of genes and experiments are needed to understand multivariate associations of gene expression patterns with observed phenotypes. Results: We present in detail a deterministic algorithm to discover patterns of multivariate gene associations in gene expression data. The patterns discovered are differential with respect to a control dataset. The algorithm is exhaustive and efficient, reporting all existent patterns that fit a given input parameter set while avoiding enumeration of the entire pattern space. The value of the pattern discovery approach is demonstrated by finding a set of genes that differentiate between two types of lymphoma. Moreover, these genes are found to behave consistently in an independent dataset produced in a different laboratory using different arrays, thus validating the genes selected using our algorithm. We show that the genes deemed significant in terms of their multivariate statistics will be missed using other methods.

UR - http://www.scopus.com/inward/record.url?scp=2442638940&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2442638940&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bth035

DO - 10.1093/bioinformatics/bth035

M3 - Article

VL - 20

SP - 1033

EP - 1044

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 7

ER -