Knowledge-based data analysis comes of age

Michael F. Ochs

doi:10.1093/bib/bbp044

Knowledge-based data analysis comes of age

Michael F. Ochs

School of Medicine

Research output: Contribution to journal › Article › peer-review

15 Scopus citations

Abstract

The emergence of high-throughput technologies for measuring biological systems has introduced problems for data interpretation that must be addressed for proper inference. First, analysis techniques need to be matched to the biological system, reflecting in their mathematical structure the underlying behavior being studied. When this is not done, mathematical techniques will generate answers, but the values and reliability estimatesmay not accurately reflect the biology. Second, analysis approaches must address the vast excess in variables measured (e.g. transcript levels of genes) over the number of samples (e.g. tumors, time points), known as the 'large-p, small-n' problem. In large-p, small-n paradigms, standard statistical techniques generally fail, and computational learning algorithms are prone to overfit the data. Here we review the emergence of techniques that match mathematical structure to the biology, the use of integrated data and prior knowledge to guide statistical analysis, and the recent emergence of analysis approaches utilizing simple biological models.We show that novel biological insights have been gained using these techniques.

Original language	English (US)
Article number	bbp044
Pages (from-to)	30-39
Number of pages	10
Journal	Briefings in Bioinformatics
Volume	11
Issue number	1
DOIs	https://doi.org/10.1093/bib/bbp044
State	Published - Oct 23 2009

Keywords

Bayesian analysis
Computational molecular biology
Databases
Metabolic pathways
Signal pathways

ASJC Scopus subject areas

Molecular Biology
Information Systems

Access to Document

10.1093/bib/bbp044

Cite this

@article{a4989070243646cd821cbca1e1493fab,

title = "Knowledge-based data analysis comes of age",

abstract = "The emergence of high-throughput technologies for measuring biological systems has introduced problems for data interpretation that must be addressed for proper inference. First, analysis techniques need to be matched to the biological system, reflecting in their mathematical structure the underlying behavior being studied. When this is not done, mathematical techniques will generate answers, but the values and reliability estimatesmay not accurately reflect the biology. Second, analysis approaches must address the vast excess in variables measured (e.g. transcript levels of genes) over the number of samples (e.g. tumors, time points), known as the 'large-p, small-n' problem. In large-p, small-n paradigms, standard statistical techniques generally fail, and computational learning algorithms are prone to overfit the data. Here we review the emergence of techniques that match mathematical structure to the biology, the use of integrated data and prior knowledge to guide statistical analysis, and the recent emergence of analysis approaches utilizing simple biological models.We show that novel biological insights have been gained using these techniques.",

keywords = "Bayesian analysis, Computational molecular biology, Databases, Metabolic pathways, Signal pathways",

author = "Ochs, {Michael F.}",

year = "2009",

month = oct,

day = "23",

doi = "10.1093/bib/bbp044",

language = "English (US)",

volume = "11",

pages = "30--39",

journal = "Briefings in Bioinformatics",

issn = "1467-5463",

publisher = "Oxford University Press",

number = "1",

}

TY - JOUR

T1 - Knowledge-based data analysis comes of age

AU - Ochs, Michael F.

PY - 2009/10/23

Y1 - 2009/10/23

N2 - The emergence of high-throughput technologies for measuring biological systems has introduced problems for data interpretation that must be addressed for proper inference. First, analysis techniques need to be matched to the biological system, reflecting in their mathematical structure the underlying behavior being studied. When this is not done, mathematical techniques will generate answers, but the values and reliability estimatesmay not accurately reflect the biology. Second, analysis approaches must address the vast excess in variables measured (e.g. transcript levels of genes) over the number of samples (e.g. tumors, time points), known as the 'large-p, small-n' problem. In large-p, small-n paradigms, standard statistical techniques generally fail, and computational learning algorithms are prone to overfit the data. Here we review the emergence of techniques that match mathematical structure to the biology, the use of integrated data and prior knowledge to guide statistical analysis, and the recent emergence of analysis approaches utilizing simple biological models.We show that novel biological insights have been gained using these techniques.

AB - The emergence of high-throughput technologies for measuring biological systems has introduced problems for data interpretation that must be addressed for proper inference. First, analysis techniques need to be matched to the biological system, reflecting in their mathematical structure the underlying behavior being studied. When this is not done, mathematical techniques will generate answers, but the values and reliability estimatesmay not accurately reflect the biology. Second, analysis approaches must address the vast excess in variables measured (e.g. transcript levels of genes) over the number of samples (e.g. tumors, time points), known as the 'large-p, small-n' problem. In large-p, small-n paradigms, standard statistical techniques generally fail, and computational learning algorithms are prone to overfit the data. Here we review the emergence of techniques that match mathematical structure to the biology, the use of integrated data and prior knowledge to guide statistical analysis, and the recent emergence of analysis approaches utilizing simple biological models.We show that novel biological insights have been gained using these techniques.

KW - Bayesian analysis

KW - Computational molecular biology

KW - Databases

KW - Metabolic pathways

KW - Signal pathways

UR - http://www.scopus.com/inward/record.url?scp=77950344077&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77950344077&partnerID=8YFLogxK

U2 - 10.1093/bib/bbp044

DO - 10.1093/bib/bbp044

M3 - Article

C2 - 19854753

AN - SCOPUS:77950344077

SN - 1467-5463

VL - 11

SP - 30

EP - 39

JO - Briefings in Bioinformatics

JF - Briefings in Bioinformatics

IS - 1

M1 - bbp044

ER -

Knowledge-based data analysis comes of age

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this