Identification of predictive cis-regulatory elements using a discriminative objective function and a dynamic search space

Rahul Karnik, Michael Beer

Research output: Contribution to journalArticle

Abstract

The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-theart computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called Motif- Spec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.

Original languageEnglish (US)
Article numbere0140557
JournalPLoS One
Volume10
Issue number10
DOIs
StatePublished - Oct 14 2015

Fingerprint

regulatory sequences
Position-Specific Scoring Matrices
High-Throughput Nucleotide Sequencing
deoxyribonucleases
Nucleotide Motifs
DNA-binding domains
Deoxyribonucleases
Pulse width modulation
Technology
genomics
DNA
Datasets

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Identification of predictive cis-regulatory elements using a discriminative objective function and a dynamic search space. / Karnik, Rahul; Beer, Michael.

In: PLoS One, Vol. 10, No. 10, e0140557, 14.10.2015.

Research output: Contribution to journalArticle

@article{6fb5bb3898a84fa9bddedf17cb7a1c85,
title = "Identification of predictive cis-regulatory elements using a discriminative objective function and a dynamic search space",
abstract = "The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-theart computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called Motif- Spec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.",
author = "Rahul Karnik and Michael Beer",
year = "2015",
month = "10",
day = "14",
doi = "10.1371/journal.pone.0140557",
language = "English (US)",
volume = "10",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "10",

}

TY - JOUR

T1 - Identification of predictive cis-regulatory elements using a discriminative objective function and a dynamic search space

AU - Karnik, Rahul

AU - Beer, Michael

PY - 2015/10/14

Y1 - 2015/10/14

N2 - The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-theart computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called Motif- Spec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.

AB - The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-theart computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called Motif- Spec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.

UR - http://www.scopus.com/inward/record.url?scp=84949033800&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949033800&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0140557

DO - 10.1371/journal.pone.0140557

M3 - Article

C2 - 26465884

AN - SCOPUS:84949033800

VL - 10

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 10

M1 - e0140557

ER -