Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies

Brian J. Haas, Arthur L. Delcher, Stephen M. Mount S.M., Jennifer R. Wortman, Roger K. Smith, Linda I. Hannick, Rama Maiti, Catherine M. Ronning, Douglas B. Rusch, Christopher D. Town, Steven L Salzberg, Owen White

Research output: Contribution to journalArticle

Abstract

The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full-length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ∼27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.

Original languageEnglish (US)
Pages (from-to)5654-5666
Number of pages13
JournalNucleic Acids Research
Volume31
Issue number19
DOIs
StatePublished - Oct 1 2003
Externally publishedYes

Fingerprint

Molecular Sequence Annotation
Arabidopsis
Genome
Untranslated Regions
Sequence Alignment
Expressed Sequence Tags
Alternative Splicing
Genes
Complementary DNA
Proteins

ASJC Scopus subject areas

  • Genetics

Cite this

Haas, B. J., Delcher, A. L., Mount S.M., S. M., Wortman, J. R., Smith, R. K., Hannick, L. I., ... White, O. (2003). Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research, 31(19), 5654-5666. https://doi.org/10.1093/nar/gkg770

Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. / Haas, Brian J.; Delcher, Arthur L.; Mount S.M., Stephen M.; Wortman, Jennifer R.; Smith, Roger K.; Hannick, Linda I.; Maiti, Rama; Ronning, Catherine M.; Rusch, Douglas B.; Town, Christopher D.; Salzberg, Steven L; White, Owen.

In: Nucleic Acids Research, Vol. 31, No. 19, 01.10.2003, p. 5654-5666.

Research output: Contribution to journalArticle

Haas, BJ, Delcher, AL, Mount S.M., SM, Wortman, JR, Smith, RK, Hannick, LI, Maiti, R, Ronning, CM, Rusch, DB, Town, CD, Salzberg, SL & White, O 2003, 'Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies', Nucleic Acids Research, vol. 31, no. 19, pp. 5654-5666. https://doi.org/10.1093/nar/gkg770
Haas BJ, Delcher AL, Mount S.M. SM, Wortman JR, Smith RK, Hannick LI et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research. 2003 Oct 1;31(19):5654-5666. https://doi.org/10.1093/nar/gkg770
Haas, Brian J. ; Delcher, Arthur L. ; Mount S.M., Stephen M. ; Wortman, Jennifer R. ; Smith, Roger K. ; Hannick, Linda I. ; Maiti, Rama ; Ronning, Catherine M. ; Rusch, Douglas B. ; Town, Christopher D. ; Salzberg, Steven L ; White, Owen. / Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. In: Nucleic Acids Research. 2003 ; Vol. 31, No. 19. pp. 5654-5666.
@article{1ba8b0c7f6c749f3839b40082ede681d,
title = "Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies",
abstract = "The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full-length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ∼27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.",
author = "Haas, {Brian J.} and Delcher, {Arthur L.} and {Mount S.M.}, {Stephen M.} and Wortman, {Jennifer R.} and Smith, {Roger K.} and Hannick, {Linda I.} and Rama Maiti and Ronning, {Catherine M.} and Rusch, {Douglas B.} and Town, {Christopher D.} and Salzberg, {Steven L} and Owen White",
year = "2003",
month = "10",
day = "1",
doi = "10.1093/nar/gkg770",
language = "English (US)",
volume = "31",
pages = "5654--5666",
journal = "Nucleic Acids Research",
issn = "1362-4962",
publisher = "Oxford University Press",
number = "19",

}

TY - JOUR

T1 - Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies

AU - Haas, Brian J.

AU - Delcher, Arthur L.

AU - Mount S.M., Stephen M.

AU - Wortman, Jennifer R.

AU - Smith, Roger K.

AU - Hannick, Linda I.

AU - Maiti, Rama

AU - Ronning, Catherine M.

AU - Rusch, Douglas B.

AU - Town, Christopher D.

AU - Salzberg, Steven L

AU - White, Owen

PY - 2003/10/1

Y1 - 2003/10/1

N2 - The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full-length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ∼27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.

AB - The spliced alignment of expressed sequence data to genomic sequence has proven a key tool in the comprehensive annotation of genes in eukaryotic genomes. A novel algorithm was developed to assemble clusters of overlapping transcript alignments (ESTs and full-length cDNAs) into maximal alignment assemblies, thereby comprehensively incorporating all available transcript data and capturing subtle splicing variations. Complete and partial gene structures identified by this method were used to improve The Institute for Genomic Research Arabidopsis genome annotation (TIGR release v.4.0). The alignment assemblies permitted the automated modeling of several novel genes and >1000 alternative splicing variations as well as updates (including UTR annotations) to nearly half of the ∼27 000 annotated protein coding genes. The algorithm of the Program to Assemble Spliced Alignments (PASA) tool is described, as well as the results of automated updates to Arabidopsis gene annotations.

UR - http://www.scopus.com/inward/record.url?scp=0141905891&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0141905891&partnerID=8YFLogxK

U2 - 10.1093/nar/gkg770

DO - 10.1093/nar/gkg770

M3 - Article

C2 - 14500829

AN - SCOPUS:0141905891

VL - 31

SP - 5654

EP - 5666

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 1362-4962

IS - 19

ER -