Identification of predictive cis-regulatory elements using a discriminative objective function and a dynamic search space

Rahul Karnik; Michael A. Beer

doi:10.1371/journal.pone.0140557

Identification of predictive cis-regulatory elements using a discriminative objective function and a dynamic search space

Rahul Karnik, Michael A. Beer

School of Medicine

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-theart computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called Motif- Spec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.

Original language	English (US)
Article number	e0140557
Journal	PloS one
Volume	10
Issue number	10
DOIs	https://doi.org/10.1371/journal.pone.0140557
State	Published - Oct 14 2015

ASJC Scopus subject areas

General Biochemistry, Genetics and Molecular Biology
General Agricultural and Biological Sciences
General

Access to Document

10.1371/journal.pone.0140557

Cite this

@article{6fb5bb3898a84fa9bddedf17cb7a1c85,

title = "Identification of predictive cis-regulatory elements using a discriminative objective function and a dynamic search space",

abstract = "The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-theart computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called Motif- Spec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.",

author = "Rahul Karnik and Beer, {Michael A.}",

note = "Funding Information: We would like to thank our lab members Dongwon Lee, Mahmoud Ghandi, Donavan Cheng, and Jun Kyu Rhee for useful discussions and constructive feedback. We would also like to thank Dongwon Lee for his assistance in generating the negative sequence datasets. M. Beer was supported by the Searle Scholars program and NIH grant HG007348. Publisher Copyright: {\textcopyright} 2015 Karnik, Beer.This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in anymedium, provided the original author and source are credited.",

year = "2015",

month = oct,

day = "14",

doi = "10.1371/journal.pone.0140557",

language = "English (US)",

volume = "10",

journal = "PloS one",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "10",

}

TY - JOUR

T1 - Identification of predictive cis-regulatory elements using a discriminative objective function and a dynamic search space

AU - Karnik, Rahul

AU - Beer, Michael A.

N1 - Funding Information: We would like to thank our lab members Dongwon Lee, Mahmoud Ghandi, Donavan Cheng, and Jun Kyu Rhee for useful discussions and constructive feedback. We would also like to thank Dongwon Lee for his assistance in generating the negative sequence datasets. M. Beer was supported by the Searle Scholars program and NIH grant HG007348. Publisher Copyright: © 2015 Karnik, Beer.This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in anymedium, provided the original author and source are credited.

PY - 2015/10/14

Y1 - 2015/10/14

N2 - The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-theart computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called Motif- Spec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.

AB - The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-theart computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called Motif- Spec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.

UR - http://www.scopus.com/inward/record.url?scp=84949033800&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84949033800&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0140557

DO - 10.1371/journal.pone.0140557

M3 - Article

C2 - 26465884

AN - SCOPUS:84949033800

SN - 1932-6203

VL - 10

JO - PloS one

JF - PloS one

IS - 10

M1 - e0140557

ER -

Identification of predictive cis-regulatory elements using a discriminative objective function and a dynamic search space

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this