CORECLUST: Identification of the conserved CRM grammar together with prediction of gene regulation

Anna A. Nikulova; Alexander V. Favorov; Roman A. Sutormin; Vsevolod J. Makeev; Andrey A. Mironov

doi:10.1093/nar/gks235

CORECLUST: Identification of the conserved CRM grammar together with prediction of gene regulation

Anna A. Nikulova, Alexander V. Favorov, Roman A. Sutormin, Vsevolod J. Makeev, Andrey A. Mironov

Research output: Contribution to journal › Article › peer-review

9 Scopus citations

Abstract

Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.

Original language	English (US)
Pages (from-to)	e93
Journal	Nucleic acids research
Volume	40
Issue number	12
DOIs	https://doi.org/10.1093/nar/gks235
State	Published - Jul 2012
Externally published	Yes

ASJC Scopus subject areas

Genetics

Access to Document

10.1093/nar/gks235

Cite this

@article{af343590a95441a78459688d33fa8543,

title = "CORECLUST: Identification of the conserved CRM grammar together with prediction of gene regulation",

abstract = "Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.",

author = "Nikulova, {Anna A.} and Favorov, {Alexander V.} and Sutormin, {Roman A.} and Makeev, {Vsevolod J.} and Mironov, {Andrey A.}",

note = "Funding Information: Programs 6 and 17 of the Russian Academy of Sciences; Russian Foundation of Basic Research [grant numbers 09-04-92742, 11-04-02016-a, 10-04-92663-IND_a and 11-04-02051-a]; State Contract of Russian Ministry of Education and Science [grant numbers 07.514.11.4007 and 07.514.11.4005]; Russian Academy of Science Presidium Program on Molecular and Cellular Biology; the Johns Hopkins University Framework for the Future; the Commonwealth Foundation and the SKCCC Center for Personalized Cancer Medicine. Funding for open access charge: Lomonosov Moscow State University.",

year = "2012",

month = jul,

doi = "10.1093/nar/gks235",

language = "English (US)",

volume = "40",

pages = "e93",

journal = "Nucleic acids research",

issn = "0305-1048",

publisher = "Oxford University Press",

number = "12",

}

TY - JOUR

T1 - CORECLUST

T2 - Identification of the conserved CRM grammar together with prediction of gene regulation

AU - Nikulova, Anna A.

AU - Favorov, Alexander V.

AU - Sutormin, Roman A.

AU - Makeev, Vsevolod J.

AU - Mironov, Andrey A.

N1 - Funding Information: Programs 6 and 17 of the Russian Academy of Sciences; Russian Foundation of Basic Research [grant numbers 09-04-92742, 11-04-02016-a, 10-04-92663-IND_a and 11-04-02051-a]; State Contract of Russian Ministry of Education and Science [grant numbers 07.514.11.4007 and 07.514.11.4005]; Russian Academy of Science Presidium Program on Molecular and Cellular Biology; the Johns Hopkins University Framework for the Future; the Commonwealth Foundation and the SKCCC Center for Personalized Cancer Medicine. Funding for open access charge: Lomonosov Moscow State University.

PY - 2012/7

Y1 - 2012/7

N2 - Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.

AB - Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.

UR - http://www.scopus.com/inward/record.url?scp=84863195617&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863195617&partnerID=8YFLogxK

U2 - 10.1093/nar/gks235

DO - 10.1093/nar/gks235

M3 - Article

C2 - 22422836

AN - SCOPUS:84863195617

SN - 0305-1048

VL - 40

SP - e93

JO - Nucleic acids research

JF - Nucleic acids research

IS - 12

ER -

CORECLUST: Identification of the conserved CRM grammar together with prediction of gene regulation

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this