Redefining CpG islands using hidden Markov models

Hao Wu, Brian Caffo, Harris A. Jaffee, Rafael A. Irizarry, Andrew P. Feinberg

Research output: Contribution to journalArticlepeer-review

107 Scopus citations

Abstract

The DNA of most vertebrates is depleted in CpG dinucleotide: a C followed by a G in the 50 to 30 direction. CpGs are the target for DNA methylation, a chemical modification of cytosine (C) heritable during cell division and the most well-characterized epigenetic mechanism. The remaining CpGs tend to cluster in regions referred to as CpG islands (CGI). Knowing CGI locations is important because they mark functionally relevant epigenetic loci in development and disease. For various mammals, including human, a readily available and widely used list of CGI is available from the UCSC Genome Browser. This list was derived using algorithms that search for regions satisfying a definition of CGI proposed by Gardiner-Garden and Frommer more than 20 years ago. Recent findings, enabled by advances in technology that permit direct measurement of epigenetic endpoints at a whole-genome scale, motivate the need to adapt the current CGI definition. In this paper, we propose a procedure, guided by hidden Markov models, that permits an extensible approach to detecting CGI. The main advantage of our approach over others is that it summarizes the evidence for CGI status as probability scores. This provides flexibility in the definition of a CGI and facilitates the creation of CGI lists for other species. The utility of this approach is demonstrated by generating the first CGI lists for invertebrates, and the fact that we can create CGI lists that substantially increases overlap with recently discovered epigenetic marks.

Original languageEnglish (US)
Pages (from-to)499-514
Number of pages16
JournalBiostatistics
Volume11
Issue number3
DOIs
StatePublished - Jul 2010

Keywords

  • CpG island
  • Epigenetics
  • Hidden Markov model
  • Sequence analysis

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Redefining CpG islands using hidden Markov models'. Together they form a unique fingerprint.

Cite this