TY - JOUR
T1 - Identification of predictive cis-regulatory elements using a discriminative objective function and a dynamic search space
AU - Karnik, Rahul
AU - Beer, Michael A.
N1 - Funding Information:
We would like to thank our lab members Dongwon Lee, Mahmoud Ghandi, Donavan Cheng, and Jun Kyu Rhee for useful discussions and constructive feedback. We would also like to thank Dongwon Lee for his assistance in generating the negative sequence datasets. M. Beer was supported by the Searle Scholars program and NIH grant HG007348.
Publisher Copyright:
© 2015 Karnik, Beer.This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in anymedium, provided the original author and source are credited.
PY - 2015/10/14
Y1 - 2015/10/14
N2 - The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-theart computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called Motif- Spec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.
AB - The generation of genomic binding or accessibility data from massively parallel sequencing technologies such as ChIP-seq and DNase-seq continues to accelerate. Yet state-of-theart computational approaches for the identification of DNA binding motifs often yield motifs of weak predictive power. Here we present a novel computational algorithm called Motif- Spec, designed to find predictive motifs, in contrast to over-represented sequence elements. The key distinguishing feature of this algorithm is that it uses a dynamic search space and a learned threshold to find discriminative motifs in combination with the modeling of motifs using a full PWM (position weight matrix) rather than k-mer words or regular expressions. We demonstrate that our approach finds motifs corresponding to known binding specificities in several mammalian ChIP-seq datasets, and that our PWMs classify the ChIP-seq signals with accuracy comparable to, or marginally better than motifs from the best existing algorithms. In other datasets, our algorithm identifies novel motifs where other methods fail. Finally, we apply this algorithm to detect motifs from expression datasets in C. elegans using a dynamic expression similarity metric rather than fixed expression clusters, and find novel predictive motifs.
UR - http://www.scopus.com/inward/record.url?scp=84949033800&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84949033800&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0140557
DO - 10.1371/journal.pone.0140557
M3 - Article
C2 - 26465884
AN - SCOPUS:84949033800
SN - 1932-6203
VL - 10
JO - PloS one
JF - PloS one
IS - 10
M1 - e0140557
ER -