Finding anchors for genomic sequence comparison

Ross A. Lippert, Xiaoyue Zhao, Liliana Florea, Clark Mobarry, Sorin Istrail

Research output: Contribution to conferencePaperpeer-review

9 Scopus citations

Abstract

Recent sequencing of the human and other mammalian genomes has brought about the necessity to align them, to identify and characterize their commonalities and differences. Programs that align whole genomes generally use a seed-and-extend technique, starting from exact or near-exact matches and selecting a reliable subset of these, called anchors, and then filling in the remaining portions between the anchors using a combination of local and global alignment algorithms, but their choices for the parameters so far have been primarily heuristic. We present a statistical framework and practical methods for selecting a set of matches that is both sensitive and specific and can constitute a reliable set of anchors for a one-to-one mapping of two genomes from which a whole-genome alignment can be built. Starting from exact matches, we introduce a novel per-base repeat annotation, the Z-score, from which noise and repeat filtering conditions are explored. Dynamic programming-based chaining algorithms are also evaluated as context-based filters. We apply the methods described here to the comparison of two progressive assemblies of the human genome, NCBI build 28 and build 34 (http://genome.ucac.edu), and show that a significant portion of the two genomes can be found in selected exact matches, with very limited amount of sequence duplication.

Original languageEnglish (US)
Pages233-241
Number of pages9
StatePublished - Jun 1 2004
Externally publishedYes
EventRECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology - San Diego, CA., United States
Duration: Mar 27 2004Mar 31 2004

Other

OtherRECOMB 2004 - Proceedings of the Eight Annual International Conference on Research in Computational Molecular Biology
Country/TerritoryUnited States
CitySan Diego, CA.
Period3/27/043/31/04

Keywords

  • MUMs
  • Suffix trees
  • Whole-genome alignments

ASJC Scopus subject areas

  • General Computer Science
  • General Biochemistry, Genetics and Molecular Biology

Fingerprint

Dive into the research topics of 'Finding anchors for genomic sequence comparison'. Together they form a unique fingerprint.

Cite this