Multivariate analysis and visualization of splicing correlations in single-gene transcriptomes

Mark C. Emerick, Giovanni Parmigiani, William S. Agnew

Research output: Research - peer-reviewArticle

Abstract

Background: RNA metabolism, through 'combinatorial splicing', can generate enormous structural diversity in the proteome. Alternative domains may interact, however, with unpredictable phenotypic consequences, necessitating integrated RNA-level regulation of molecular composition. Splicing correlations within transcripts of single genes provide valuable clues to functional relationships among molecular domains as well as genomic targets for higher-order splicing regulation. Results: We present tools to visualize complex splicing patterns in full-length cDNA libraries. Developmental changes in pair-wise correlations are presented vectorially in 'clock plots' and linkage grids. Higher-order correlations are assessed statistically through Monte Carlo analysis of a log-linear model with an empirical-Bayes estimate of the true probabilities of observed and unobserved splice forms. Log-linear coefficients are visualized in a 'spliceprint,' a signature of splice correlations in the transcriptome. We present two novel metrics: the linkage change index, which measures the directional change in pair-wise correlation with tissue differentiation, and the accuracy index, a very simple goodness-of-fit metric that is more sensitive than the integrated squared error when applied to sparsely populated tables, and unlike chi-square, does not diverge at low variance. Considerable attention is given to sparse contingency tables, which are inherent to single-gene libraries. Conclusion: Patterns of splicing correlations are revealed, which span a broad range of interaction order and change in development. The methods have a broad scope of applicability, beyond the single gene - including, for example, multiple gene interactions in the complete transcriptome.

LanguageEnglish (US)
Article number16
JournalBMC Bioinformatics
Volume8
DOIs
StatePublished - 2007

Fingerprint

Multivariate Analysis
Visualization
Gene
Genes
Transcriptome
Gene Library
RNA
Linkage
Higher Order
Metric
Interaction
Libraries
Proteome
Metabolism
Clocks
Tissue
Chemical analysis
Linear Models
Proteins
Integrated Squared Error

ASJC Scopus subject areas

  • Medicine(all)
  • Structural Biology
  • Applied Mathematics

Cite this

Multivariate analysis and visualization of splicing correlations in single-gene transcriptomes. / Emerick, Mark C.; Parmigiani, Giovanni; Agnew, William S.

In: BMC Bioinformatics, Vol. 8, 16, 2007.

Research output: Research - peer-reviewArticle

@article{b2fb214d42354375994373f735d26d01,
title = "Multivariate analysis and visualization of splicing correlations in single-gene transcriptomes",
abstract = "Background: RNA metabolism, through 'combinatorial splicing', can generate enormous structural diversity in the proteome. Alternative domains may interact, however, with unpredictable phenotypic consequences, necessitating integrated RNA-level regulation of molecular composition. Splicing correlations within transcripts of single genes provide valuable clues to functional relationships among molecular domains as well as genomic targets for higher-order splicing regulation. Results: We present tools to visualize complex splicing patterns in full-length cDNA libraries. Developmental changes in pair-wise correlations are presented vectorially in 'clock plots' and linkage grids. Higher-order correlations are assessed statistically through Monte Carlo analysis of a log-linear model with an empirical-Bayes estimate of the true probabilities of observed and unobserved splice forms. Log-linear coefficients are visualized in a 'spliceprint,' a signature of splice correlations in the transcriptome. We present two novel metrics: the linkage change index, which measures the directional change in pair-wise correlation with tissue differentiation, and the accuracy index, a very simple goodness-of-fit metric that is more sensitive than the integrated squared error when applied to sparsely populated tables, and unlike chi-square, does not diverge at low variance. Considerable attention is given to sparse contingency tables, which are inherent to single-gene libraries. Conclusion: Patterns of splicing correlations are revealed, which span a broad range of interaction order and change in development. The methods have a broad scope of applicability, beyond the single gene - including, for example, multiple gene interactions in the complete transcriptome.",
author = "Emerick, {Mark C.} and Giovanni Parmigiani and Agnew, {William S.}",
year = "2007",
doi = "10.1186/1471-2105-8-16",
volume = "8",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Multivariate analysis and visualization of splicing correlations in single-gene transcriptomes

AU - Emerick,Mark C.

AU - Parmigiani,Giovanni

AU - Agnew,William S.

PY - 2007

Y1 - 2007

N2 - Background: RNA metabolism, through 'combinatorial splicing', can generate enormous structural diversity in the proteome. Alternative domains may interact, however, with unpredictable phenotypic consequences, necessitating integrated RNA-level regulation of molecular composition. Splicing correlations within transcripts of single genes provide valuable clues to functional relationships among molecular domains as well as genomic targets for higher-order splicing regulation. Results: We present tools to visualize complex splicing patterns in full-length cDNA libraries. Developmental changes in pair-wise correlations are presented vectorially in 'clock plots' and linkage grids. Higher-order correlations are assessed statistically through Monte Carlo analysis of a log-linear model with an empirical-Bayes estimate of the true probabilities of observed and unobserved splice forms. Log-linear coefficients are visualized in a 'spliceprint,' a signature of splice correlations in the transcriptome. We present two novel metrics: the linkage change index, which measures the directional change in pair-wise correlation with tissue differentiation, and the accuracy index, a very simple goodness-of-fit metric that is more sensitive than the integrated squared error when applied to sparsely populated tables, and unlike chi-square, does not diverge at low variance. Considerable attention is given to sparse contingency tables, which are inherent to single-gene libraries. Conclusion: Patterns of splicing correlations are revealed, which span a broad range of interaction order and change in development. The methods have a broad scope of applicability, beyond the single gene - including, for example, multiple gene interactions in the complete transcriptome.

AB - Background: RNA metabolism, through 'combinatorial splicing', can generate enormous structural diversity in the proteome. Alternative domains may interact, however, with unpredictable phenotypic consequences, necessitating integrated RNA-level regulation of molecular composition. Splicing correlations within transcripts of single genes provide valuable clues to functional relationships among molecular domains as well as genomic targets for higher-order splicing regulation. Results: We present tools to visualize complex splicing patterns in full-length cDNA libraries. Developmental changes in pair-wise correlations are presented vectorially in 'clock plots' and linkage grids. Higher-order correlations are assessed statistically through Monte Carlo analysis of a log-linear model with an empirical-Bayes estimate of the true probabilities of observed and unobserved splice forms. Log-linear coefficients are visualized in a 'spliceprint,' a signature of splice correlations in the transcriptome. We present two novel metrics: the linkage change index, which measures the directional change in pair-wise correlation with tissue differentiation, and the accuracy index, a very simple goodness-of-fit metric that is more sensitive than the integrated squared error when applied to sparsely populated tables, and unlike chi-square, does not diverge at low variance. Considerable attention is given to sparse contingency tables, which are inherent to single-gene libraries. Conclusion: Patterns of splicing correlations are revealed, which span a broad range of interaction order and change in development. The methods have a broad scope of applicability, beyond the single gene - including, for example, multiple gene interactions in the complete transcriptome.

UR - http://www.scopus.com/inward/record.url?scp=33846962479&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846962479&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-8-16

DO - 10.1186/1471-2105-8-16

M3 - Article

VL - 8

JO - BMC Bioinformatics

T2 - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 16

ER -