Computational gene prediction using multiple sources of evidence

Jonathan E. Allen, Mihaela Pertea, Steven L Salzberg

Research output: Contribution to journalArticle

Abstract

This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program, called Combiner, takes as input a genomic sequence and the locations of gene predictions from ab initio gene finders, protein sequence alignments, expressed sequence tag and cDNA alignments, splice site predictions, and other evidence. Three different algorithms for combining evidence in the Combiner were implemented and tested on 1783 confirmed genes in Arabidopsis thaliana. Our results show that combining gene prediction evidence consistently outperforms even the best individual gene finder and, in some cases, can produce dramatic improvements in sensitivity and specificity.

Original languageEnglish (US)
Pages (from-to)142-148
Number of pages7
JournalGenome Research
Volume14
Issue number1
DOIs
StatePublished - Jan 2004

Fingerprint

Genes
Sequence Alignment
Expressed Sequence Tags
Arabidopsis
Complementary DNA
Genome
Sensitivity and Specificity
Proteins

ASJC Scopus subject areas

  • Genetics

Cite this

Computational gene prediction using multiple sources of evidence. / Allen, Jonathan E.; Pertea, Mihaela; Salzberg, Steven L.

In: Genome Research, Vol. 14, No. 1, 01.2004, p. 142-148.

Research output: Contribution to journalArticle

@article{cdcfd538ca4a495eaaf2553bd3604cf8,
title = "Computational gene prediction using multiple sources of evidence",
abstract = "This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program, called Combiner, takes as input a genomic sequence and the locations of gene predictions from ab initio gene finders, protein sequence alignments, expressed sequence tag and cDNA alignments, splice site predictions, and other evidence. Three different algorithms for combining evidence in the Combiner were implemented and tested on 1783 confirmed genes in Arabidopsis thaliana. Our results show that combining gene prediction evidence consistently outperforms even the best individual gene finder and, in some cases, can produce dramatic improvements in sensitivity and specificity.",
author = "Allen, {Jonathan E.} and Mihaela Pertea and Salzberg, {Steven L}",
year = "2004",
month = "1",
doi = "10.1101/gr.1562804",
language = "English (US)",
volume = "14",
pages = "142--148",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "1",

}

TY - JOUR

T1 - Computational gene prediction using multiple sources of evidence

AU - Allen, Jonathan E.

AU - Pertea, Mihaela

AU - Salzberg, Steven L

PY - 2004/1

Y1 - 2004/1

N2 - This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program, called Combiner, takes as input a genomic sequence and the locations of gene predictions from ab initio gene finders, protein sequence alignments, expressed sequence tag and cDNA alignments, splice site predictions, and other evidence. Three different algorithms for combining evidence in the Combiner were implemented and tested on 1783 confirmed genes in Arabidopsis thaliana. Our results show that combining gene prediction evidence consistently outperforms even the best individual gene finder and, in some cases, can produce dramatic improvements in sensitivity and specificity.

AB - This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program, called Combiner, takes as input a genomic sequence and the locations of gene predictions from ab initio gene finders, protein sequence alignments, expressed sequence tag and cDNA alignments, splice site predictions, and other evidence. Three different algorithms for combining evidence in the Combiner were implemented and tested on 1783 confirmed genes in Arabidopsis thaliana. Our results show that combining gene prediction evidence consistently outperforms even the best individual gene finder and, in some cases, can produce dramatic improvements in sensitivity and specificity.

UR - http://www.scopus.com/inward/record.url?scp=0346505461&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0346505461&partnerID=8YFLogxK

U2 - 10.1101/gr.1562804

DO - 10.1101/gr.1562804

M3 - Article

C2 - 14707176

AN - SCOPUS:0346505461

VL - 14

SP - 142

EP - 148

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 1

ER -