Sim4cc: A cross-species spliced alignment program

Leming Zhou, Mihaela Pertea, Arthur L. Delcher, Liliana Florea

Research output: Contribution to journalArticle

Abstract

Advances in sequencing technologies have accelerated the sequencing of new genomes, far outpacing the generation of gene and protein resources needed to annotate them. Direct comparison and alignment of existing cDNA sequences from a related species is an effective and readily available means to determine genes in the new genomes. Current spliced alignment programs are inadequate for comparing sequences between different species, owing to their low sensitivity and splice junction accuracy. A new spliced alignment tool, sim4cc, overcomes problems in the earlier tools by incorporating three new features: universal spaced seeds, to increase sensitivity and allow comparisons between species at various evolutionary distances, and powerful splice signal models and evolutionarily-aware alignment techniques, to improve the accuracy of gene models. When tested on vertebrate comparisons at diverse evolutionary distances, sim4cc had significantly higher sensitivity compared to existing alignment programs, more than 10% higher than the closest competitor for some comparisons, while being comparable in speed to its predecessor, sim4. Sim4cc can be used in one-to-one or one-to-many comparisons of genomic and cDNA sequences, and can also be effectively incorporated into a high-throughput annotation engine, as demonstrated by the mapping of 64 000 Fagus grandifolia 454 ESTs and unigenes to the poplar genome.

Original languageEnglish (US)
Article numbere80
JournalNucleic acids research
Volume37
Issue number11
DOIs
StatePublished - 2009

ASJC Scopus subject areas

  • Genetics

Fingerprint Dive into the research topics of 'Sim4cc: A cross-species spliced alignment program'. Together they form a unique fingerprint.

  • Cite this