kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.

Christopher Fletez-Brant; Dongwon Lee; Andrew S. McCallion; Michael A. Beer

doi:10.1093/nar/gkt519

kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.

Christopher Fletez-Brant, Dongwon Lee, Andrew S. McCallion, Michael A. Beer

School of Medicine

Research output: Contribution to journal › Article › peer-review

74 Scopus citations

Abstract

Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.

Original language	English (US)
Pages (from-to)	W544-556
Journal	Unknown Journal
Volume	41
Issue number	Web Server issue
DOIs	https://doi.org/10.1093/nar/gkt519
State	Published - Jul 2013

ASJC Scopus subject areas

Genetics

Access to Document

10.1093/nar/gkt519

Cite this

@article{deaf18fba7fc4267b976737d7ff473a9,

title = "kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.",

abstract = "Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.",

author = "Christopher Fletez-Brant and Dongwon Lee and McCallion, {Andrew S.} and Beer, {Michael A.}",

note = "Funding Information: A.S.M. and M.B. were funded in part by the National Institute of Neurological Disease and Stroke [NS062972]; A.S.M. received additional funding from the National Heart Lung and Blood Institute [HL111267]. M.B. was supported by the Searle Scholars Program. Funding for open access charge: NIH NINDS [NS062972].",

year = "2013",

month = jul,

doi = "10.1093/nar/gkt519",

language = "English (US)",

volume = "41",

pages = "W544--556",

journal = "Unknown Journal",

issn = "0309-1708",

publisher = "Elsevier Limited",

number = "Web Server issue",

}

TY - JOUR

T1 - kmer-SVM

T2 - a web server for identifying predictive regulatory sequence features in genomic data sets.

AU - Fletez-Brant, Christopher

AU - Lee, Dongwon

AU - McCallion, Andrew S.

AU - Beer, Michael A.

N1 - Funding Information: A.S.M. and M.B. were funded in part by the National Institute of Neurological Disease and Stroke [NS062972]; A.S.M. received additional funding from the National Heart Lung and Blood Institute [HL111267]. M.B. was supported by the Searle Scholars Program. Funding for open access charge: NIH NINDS [NS062972].

PY - 2013/7

Y1 - 2013/7

N2 - Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.

AB - Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.

UR - http://www.scopus.com/inward/record.url?scp=84883588081&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84883588081&partnerID=8YFLogxK

U2 - 10.1093/nar/gkt519

DO - 10.1093/nar/gkt519

M3 - Article

C2 - 23771147

AN - SCOPUS:84883588081

SN - 0309-1708

VL - 41

SP - W544-556

JO - Unknown Journal

JF - Unknown Journal

IS - Web Server issue

ER -

kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this