Integrative analysis of multiple ChIP-X data sets using correlation motifs

Hong Kai Ji, Yingying Wei

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Genome-wide chromatin immunoprecipitation experiments including ChIP-seq and ChIP-chip, jointly referred to as ChIP-X, are high-throughput technologies to map protein-DNA interactions in the genome. When multiple related ChIP-X data sets are available, separately analyzing each data set is not optimal because it may lack power to detect consistent but relatively weak signals in multiple studies. Jointly analyzing all data sets may allow one to borrow information across studies to improve signal detection. However, a common problem in data integration is the difficulty in handling data set-specific signals that cannot be dealt with by simply assuming that the signal status for each genomic locus is the same across all studies. An integration model that naively enumerates all possible study specificity patterns, conversely, has exponential complexity because there are 2D possible combinatorial signal presence and absence patterns for D studies. Correlation motifs provide a useful solution to this problem. By introducing a small number of latent probability vectors called correlation motifs, this approach can describe the major correlation structure among multiple data sets, which can then be used to guide information sharing across data sets. The correlation motif approach is capable of improving signal detection. At the same time, it does not have the problem of exponential model complexity and is flexible enough to handle all possible data set-specific signal configurations.

Original languageEnglish (US)
Title of host publicationIntegrating Omics Data
PublisherCambridge University Press
Pages110-132
Number of pages23
ISBN (Electronic)9781107706484
ISBN (Print)9781107069114
DOIs
StatePublished - Jan 1 2015

Fingerprint

Genome
Protein Interaction Maps
Information Dissemination
Chromatin Immunoprecipitation
Datasets
Technology
DNA

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Ji, H. K., & Wei, Y. (2015). Integrative analysis of multiple ChIP-X data sets using correlation motifs. In Integrating Omics Data (pp. 110-132). Cambridge University Press. https://doi.org/10.1017/CBO9781107706484.006

Integrative analysis of multiple ChIP-X data sets using correlation motifs. / Ji, Hong Kai; Wei, Yingying.

Integrating Omics Data. Cambridge University Press, 2015. p. 110-132.

Research output: Chapter in Book/Report/Conference proceedingChapter

Ji, HK & Wei, Y 2015, Integrative analysis of multiple ChIP-X data sets using correlation motifs. in Integrating Omics Data. Cambridge University Press, pp. 110-132. https://doi.org/10.1017/CBO9781107706484.006
Ji HK, Wei Y. Integrative analysis of multiple ChIP-X data sets using correlation motifs. In Integrating Omics Data. Cambridge University Press. 2015. p. 110-132 https://doi.org/10.1017/CBO9781107706484.006
Ji, Hong Kai ; Wei, Yingying. / Integrative analysis of multiple ChIP-X data sets using correlation motifs. Integrating Omics Data. Cambridge University Press, 2015. pp. 110-132
@inbook{ef48c7aae9c0491d96196be0ce74999a,
title = "Integrative analysis of multiple ChIP-X data sets using correlation motifs",
abstract = "Genome-wide chromatin immunoprecipitation experiments including ChIP-seq and ChIP-chip, jointly referred to as ChIP-X, are high-throughput technologies to map protein-DNA interactions in the genome. When multiple related ChIP-X data sets are available, separately analyzing each data set is not optimal because it may lack power to detect consistent but relatively weak signals in multiple studies. Jointly analyzing all data sets may allow one to borrow information across studies to improve signal detection. However, a common problem in data integration is the difficulty in handling data set-specific signals that cannot be dealt with by simply assuming that the signal status for each genomic locus is the same across all studies. An integration model that naively enumerates all possible study specificity patterns, conversely, has exponential complexity because there are 2D possible combinatorial signal presence and absence patterns for D studies. Correlation motifs provide a useful solution to this problem. By introducing a small number of latent probability vectors called correlation motifs, this approach can describe the major correlation structure among multiple data sets, which can then be used to guide information sharing across data sets. The correlation motif approach is capable of improving signal detection. At the same time, it does not have the problem of exponential model complexity and is flexible enough to handle all possible data set-specific signal configurations.",
author = "Ji, {Hong Kai} and Yingying Wei",
year = "2015",
month = "1",
day = "1",
doi = "10.1017/CBO9781107706484.006",
language = "English (US)",
isbn = "9781107069114",
pages = "110--132",
booktitle = "Integrating Omics Data",
publisher = "Cambridge University Press",

}

TY - CHAP

T1 - Integrative analysis of multiple ChIP-X data sets using correlation motifs

AU - Ji, Hong Kai

AU - Wei, Yingying

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Genome-wide chromatin immunoprecipitation experiments including ChIP-seq and ChIP-chip, jointly referred to as ChIP-X, are high-throughput technologies to map protein-DNA interactions in the genome. When multiple related ChIP-X data sets are available, separately analyzing each data set is not optimal because it may lack power to detect consistent but relatively weak signals in multiple studies. Jointly analyzing all data sets may allow one to borrow information across studies to improve signal detection. However, a common problem in data integration is the difficulty in handling data set-specific signals that cannot be dealt with by simply assuming that the signal status for each genomic locus is the same across all studies. An integration model that naively enumerates all possible study specificity patterns, conversely, has exponential complexity because there are 2D possible combinatorial signal presence and absence patterns for D studies. Correlation motifs provide a useful solution to this problem. By introducing a small number of latent probability vectors called correlation motifs, this approach can describe the major correlation structure among multiple data sets, which can then be used to guide information sharing across data sets. The correlation motif approach is capable of improving signal detection. At the same time, it does not have the problem of exponential model complexity and is flexible enough to handle all possible data set-specific signal configurations.

AB - Genome-wide chromatin immunoprecipitation experiments including ChIP-seq and ChIP-chip, jointly referred to as ChIP-X, are high-throughput technologies to map protein-DNA interactions in the genome. When multiple related ChIP-X data sets are available, separately analyzing each data set is not optimal because it may lack power to detect consistent but relatively weak signals in multiple studies. Jointly analyzing all data sets may allow one to borrow information across studies to improve signal detection. However, a common problem in data integration is the difficulty in handling data set-specific signals that cannot be dealt with by simply assuming that the signal status for each genomic locus is the same across all studies. An integration model that naively enumerates all possible study specificity patterns, conversely, has exponential complexity because there are 2D possible combinatorial signal presence and absence patterns for D studies. Correlation motifs provide a useful solution to this problem. By introducing a small number of latent probability vectors called correlation motifs, this approach can describe the major correlation structure among multiple data sets, which can then be used to guide information sharing across data sets. The correlation motif approach is capable of improving signal detection. At the same time, it does not have the problem of exponential model complexity and is flexible enough to handle all possible data set-specific signal configurations.

UR - http://www.scopus.com/inward/record.url?scp=85014837003&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85014837003&partnerID=8YFLogxK

U2 - 10.1017/CBO9781107706484.006

DO - 10.1017/CBO9781107706484.006

M3 - Chapter

AN - SCOPUS:85014837003

SN - 9781107069114

SP - 110

EP - 132

BT - Integrating Omics Data

PB - Cambridge University Press

ER -