Metrics of sequence constraint overlook regulatory sequences in an exhaustive analysis at phox2b

David M. McGaughey, Ryan M. Vinton, Jimmy Huynh, Amr Al-Saif, Michael Beer, Andrew S McCallion

Research output: Contribution to journalArticle

Abstract

Despite its recognized utility, the extent to which evolutionary sequence conservation-based approaches may systematically overlook functional noncoding sequences remains unclear. We have tiled across sequence encompassing the zebrafish phox2b gene, ultimately evaluating 48 amplicons corresponding to all noncoding sequences therein for enhancer activity in zebrafish. Post hoc analyses of this interval utilizing five commonly used measures of evolutionary constraint (AVID, MLAGAN, SLAGAN, phastCons, WebMCS) demonstrate that each systematically overlooks regulatory sequences. These established algorithms detected only 29%-61% of our identified regulatory elements, consistent with the suggestion that many regulatory sequences may not be readily detected by metrics of sequence constraint. However, we were able to discriminate functional from nonfunctional sequences based upon GC composition and identified position weight matrices (PWM), demonstrating that, in at least one case, deleting sequences containing a subset of these PWMs from one identified regulatory element abrogated its regulatory function. Collectively, these data demonstrate that the noncoding functional component of vertebrate genomes may far exceed estimates predicated on evolutionary constraint.

Original languageEnglish (US)
Pages (from-to)252-260
Number of pages9
JournalGenome Research
Volume18
Issue number2
DOIs
StatePublished - Feb 2008

Fingerprint

Zebrafish
Genome Components
Position-Specific Scoring Matrices
Base Composition
Vertebrates
Genes

ASJC Scopus subject areas

  • Genetics

Cite this

Metrics of sequence constraint overlook regulatory sequences in an exhaustive analysis at phox2b. / McGaughey, David M.; Vinton, Ryan M.; Huynh, Jimmy; Al-Saif, Amr; Beer, Michael; McCallion, Andrew S.

In: Genome Research, Vol. 18, No. 2, 02.2008, p. 252-260.

Research output: Contribution to journalArticle

McGaughey, David M. ; Vinton, Ryan M. ; Huynh, Jimmy ; Al-Saif, Amr ; Beer, Michael ; McCallion, Andrew S. / Metrics of sequence constraint overlook regulatory sequences in an exhaustive analysis at phox2b. In: Genome Research. 2008 ; Vol. 18, No. 2. pp. 252-260.
@article{7fed728786c7419e8a42750a04db8865,
title = "Metrics of sequence constraint overlook regulatory sequences in an exhaustive analysis at phox2b",
abstract = "Despite its recognized utility, the extent to which evolutionary sequence conservation-based approaches may systematically overlook functional noncoding sequences remains unclear. We have tiled across sequence encompassing the zebrafish phox2b gene, ultimately evaluating 48 amplicons corresponding to all noncoding sequences therein for enhancer activity in zebrafish. Post hoc analyses of this interval utilizing five commonly used measures of evolutionary constraint (AVID, MLAGAN, SLAGAN, phastCons, WebMCS) demonstrate that each systematically overlooks regulatory sequences. These established algorithms detected only 29{\%}-61{\%} of our identified regulatory elements, consistent with the suggestion that many regulatory sequences may not be readily detected by metrics of sequence constraint. However, we were able to discriminate functional from nonfunctional sequences based upon GC composition and identified position weight matrices (PWM), demonstrating that, in at least one case, deleting sequences containing a subset of these PWMs from one identified regulatory element abrogated its regulatory function. Collectively, these data demonstrate that the noncoding functional component of vertebrate genomes may far exceed estimates predicated on evolutionary constraint.",
author = "McGaughey, {David M.} and Vinton, {Ryan M.} and Jimmy Huynh and Amr Al-Saif and Michael Beer and McCallion, {Andrew S}",
year = "2008",
month = "2",
doi = "10.1101/gr.6929408",
language = "English (US)",
volume = "18",
pages = "252--260",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "2",

}

TY - JOUR

T1 - Metrics of sequence constraint overlook regulatory sequences in an exhaustive analysis at phox2b

AU - McGaughey, David M.

AU - Vinton, Ryan M.

AU - Huynh, Jimmy

AU - Al-Saif, Amr

AU - Beer, Michael

AU - McCallion, Andrew S

PY - 2008/2

Y1 - 2008/2

N2 - Despite its recognized utility, the extent to which evolutionary sequence conservation-based approaches may systematically overlook functional noncoding sequences remains unclear. We have tiled across sequence encompassing the zebrafish phox2b gene, ultimately evaluating 48 amplicons corresponding to all noncoding sequences therein for enhancer activity in zebrafish. Post hoc analyses of this interval utilizing five commonly used measures of evolutionary constraint (AVID, MLAGAN, SLAGAN, phastCons, WebMCS) demonstrate that each systematically overlooks regulatory sequences. These established algorithms detected only 29%-61% of our identified regulatory elements, consistent with the suggestion that many regulatory sequences may not be readily detected by metrics of sequence constraint. However, we were able to discriminate functional from nonfunctional sequences based upon GC composition and identified position weight matrices (PWM), demonstrating that, in at least one case, deleting sequences containing a subset of these PWMs from one identified regulatory element abrogated its regulatory function. Collectively, these data demonstrate that the noncoding functional component of vertebrate genomes may far exceed estimates predicated on evolutionary constraint.

AB - Despite its recognized utility, the extent to which evolutionary sequence conservation-based approaches may systematically overlook functional noncoding sequences remains unclear. We have tiled across sequence encompassing the zebrafish phox2b gene, ultimately evaluating 48 amplicons corresponding to all noncoding sequences therein for enhancer activity in zebrafish. Post hoc analyses of this interval utilizing five commonly used measures of evolutionary constraint (AVID, MLAGAN, SLAGAN, phastCons, WebMCS) demonstrate that each systematically overlooks regulatory sequences. These established algorithms detected only 29%-61% of our identified regulatory elements, consistent with the suggestion that many regulatory sequences may not be readily detected by metrics of sequence constraint. However, we were able to discriminate functional from nonfunctional sequences based upon GC composition and identified position weight matrices (PWM), demonstrating that, in at least one case, deleting sequences containing a subset of these PWMs from one identified regulatory element abrogated its regulatory function. Collectively, these data demonstrate that the noncoding functional component of vertebrate genomes may far exceed estimates predicated on evolutionary constraint.

UR - http://www.scopus.com/inward/record.url?scp=39049147650&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=39049147650&partnerID=8YFLogxK

U2 - 10.1101/gr.6929408

DO - 10.1101/gr.6929408

M3 - Article

C2 - 18071029

AN - SCOPUS:39049147650

VL - 18

SP - 252

EP - 260

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 2

ER -