Gene and alternative splicing annotation with AIR

Liliana D Florea, Valentina Di Francesco, Jason Miller, Russell Turner, Alison Yao, Michael Harris, Brian Walenz, Clark Mobarry, Gennady V. Merkulov, Rosane Charlab, Ian Dew, Zuoming Deng, Sorin Istrail, Peter Li, Granger Sutton

Research output: Contribution to journalArticle

Abstract

Designing effective and accurate tools for identifying the functional and structural elements in a genome remains at the frontier of genome annotation owing to incompleteness and inaccuracy of the data, limitations in the computational models, and shifting paradigms in genomics, such as alternative splicing. We present a methodology for the automated annotation of genes and their alternatively spliced mRNA transcripts based on existing cDNA and protein sequence evidence from the same species or projected from a related species using syntenic mapping information. At the core of the method is the splice graph, a compact representation of a gene, its exons, introns, and alternatively spliced isoforms. The putative transcripts are enumerated from the graph and assigned confidence scores based on the strength of sequence evidence, and a subset of the high-scoring candidates are selected and promoted into the annotation. The method is highly selective, eliminating the unlikely candidates while retaining 98% of the high-quality mRNA evidence in well-formed transcripts, and produces annotation that is measurably more accurate than some evidence-based gene sets. The process is fast, accurate, and fully automated, and combines the traditionally distinct gene annotation and alternative splicing detection processes in a comprehensive and systematic way, thus considerably aiding in the ensuing manual curation efforts.

Original languageEnglish (US)
Pages (from-to)54-66
Number of pages13
JournalGenome Research
Volume15
Issue number1
DOIs
StatePublished - Jan 2005
Externally publishedYes

Fingerprint

Molecular Sequence Annotation
Alternative Splicing
Genome
Messenger RNA
Genomics
Introns
Genes
Exons
Protein Isoforms
Complementary DNA
Proteins

ASJC Scopus subject areas

  • Genetics

Cite this

Florea, L. D., Di Francesco, V., Miller, J., Turner, R., Yao, A., Harris, M., ... Sutton, G. (2005). Gene and alternative splicing annotation with AIR. Genome Research, 15(1), 54-66. https://doi.org/10.1101/gr.2889405

Gene and alternative splicing annotation with AIR. / Florea, Liliana D; Di Francesco, Valentina; Miller, Jason; Turner, Russell; Yao, Alison; Harris, Michael; Walenz, Brian; Mobarry, Clark; Merkulov, Gennady V.; Charlab, Rosane; Dew, Ian; Deng, Zuoming; Istrail, Sorin; Li, Peter; Sutton, Granger.

In: Genome Research, Vol. 15, No. 1, 01.2005, p. 54-66.

Research output: Contribution to journalArticle

Florea, LD, Di Francesco, V, Miller, J, Turner, R, Yao, A, Harris, M, Walenz, B, Mobarry, C, Merkulov, GV, Charlab, R, Dew, I, Deng, Z, Istrail, S, Li, P & Sutton, G 2005, 'Gene and alternative splicing annotation with AIR', Genome Research, vol. 15, no. 1, pp. 54-66. https://doi.org/10.1101/gr.2889405
Florea LD, Di Francesco V, Miller J, Turner R, Yao A, Harris M et al. Gene and alternative splicing annotation with AIR. Genome Research. 2005 Jan;15(1):54-66. https://doi.org/10.1101/gr.2889405
Florea, Liliana D ; Di Francesco, Valentina ; Miller, Jason ; Turner, Russell ; Yao, Alison ; Harris, Michael ; Walenz, Brian ; Mobarry, Clark ; Merkulov, Gennady V. ; Charlab, Rosane ; Dew, Ian ; Deng, Zuoming ; Istrail, Sorin ; Li, Peter ; Sutton, Granger. / Gene and alternative splicing annotation with AIR. In: Genome Research. 2005 ; Vol. 15, No. 1. pp. 54-66.
@article{83502c09a5c647249ad6654bd7f9a3ac,
title = "Gene and alternative splicing annotation with AIR",
abstract = "Designing effective and accurate tools for identifying the functional and structural elements in a genome remains at the frontier of genome annotation owing to incompleteness and inaccuracy of the data, limitations in the computational models, and shifting paradigms in genomics, such as alternative splicing. We present a methodology for the automated annotation of genes and their alternatively spliced mRNA transcripts based on existing cDNA and protein sequence evidence from the same species or projected from a related species using syntenic mapping information. At the core of the method is the splice graph, a compact representation of a gene, its exons, introns, and alternatively spliced isoforms. The putative transcripts are enumerated from the graph and assigned confidence scores based on the strength of sequence evidence, and a subset of the high-scoring candidates are selected and promoted into the annotation. The method is highly selective, eliminating the unlikely candidates while retaining 98{\%} of the high-quality mRNA evidence in well-formed transcripts, and produces annotation that is measurably more accurate than some evidence-based gene sets. The process is fast, accurate, and fully automated, and combines the traditionally distinct gene annotation and alternative splicing detection processes in a comprehensive and systematic way, thus considerably aiding in the ensuing manual curation efforts.",
author = "Florea, {Liliana D} and {Di Francesco}, Valentina and Jason Miller and Russell Turner and Alison Yao and Michael Harris and Brian Walenz and Clark Mobarry and Merkulov, {Gennady V.} and Rosane Charlab and Ian Dew and Zuoming Deng and Sorin Istrail and Peter Li and Granger Sutton",
year = "2005",
month = "1",
doi = "10.1101/gr.2889405",
language = "English (US)",
volume = "15",
pages = "54--66",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "1",

}

TY - JOUR

T1 - Gene and alternative splicing annotation with AIR

AU - Florea, Liliana D

AU - Di Francesco, Valentina

AU - Miller, Jason

AU - Turner, Russell

AU - Yao, Alison

AU - Harris, Michael

AU - Walenz, Brian

AU - Mobarry, Clark

AU - Merkulov, Gennady V.

AU - Charlab, Rosane

AU - Dew, Ian

AU - Deng, Zuoming

AU - Istrail, Sorin

AU - Li, Peter

AU - Sutton, Granger

PY - 2005/1

Y1 - 2005/1

N2 - Designing effective and accurate tools for identifying the functional and structural elements in a genome remains at the frontier of genome annotation owing to incompleteness and inaccuracy of the data, limitations in the computational models, and shifting paradigms in genomics, such as alternative splicing. We present a methodology for the automated annotation of genes and their alternatively spliced mRNA transcripts based on existing cDNA and protein sequence evidence from the same species or projected from a related species using syntenic mapping information. At the core of the method is the splice graph, a compact representation of a gene, its exons, introns, and alternatively spliced isoforms. The putative transcripts are enumerated from the graph and assigned confidence scores based on the strength of sequence evidence, and a subset of the high-scoring candidates are selected and promoted into the annotation. The method is highly selective, eliminating the unlikely candidates while retaining 98% of the high-quality mRNA evidence in well-formed transcripts, and produces annotation that is measurably more accurate than some evidence-based gene sets. The process is fast, accurate, and fully automated, and combines the traditionally distinct gene annotation and alternative splicing detection processes in a comprehensive and systematic way, thus considerably aiding in the ensuing manual curation efforts.

AB - Designing effective and accurate tools for identifying the functional and structural elements in a genome remains at the frontier of genome annotation owing to incompleteness and inaccuracy of the data, limitations in the computational models, and shifting paradigms in genomics, such as alternative splicing. We present a methodology for the automated annotation of genes and their alternatively spliced mRNA transcripts based on existing cDNA and protein sequence evidence from the same species or projected from a related species using syntenic mapping information. At the core of the method is the splice graph, a compact representation of a gene, its exons, introns, and alternatively spliced isoforms. The putative transcripts are enumerated from the graph and assigned confidence scores based on the strength of sequence evidence, and a subset of the high-scoring candidates are selected and promoted into the annotation. The method is highly selective, eliminating the unlikely candidates while retaining 98% of the high-quality mRNA evidence in well-formed transcripts, and produces annotation that is measurably more accurate than some evidence-based gene sets. The process is fast, accurate, and fully automated, and combines the traditionally distinct gene annotation and alternative splicing detection processes in a comprehensive and systematic way, thus considerably aiding in the ensuing manual curation efforts.

UR - http://www.scopus.com/inward/record.url?scp=19944433052&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=19944433052&partnerID=8YFLogxK

U2 - 10.1101/gr.2889405

DO - 10.1101/gr.2889405

M3 - Article

C2 - 15632090

AN - SCOPUS:19944433052

VL - 15

SP - 54

EP - 66

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 1

ER -