Genome annotation of Anopheles gambiae using mass spectometry-derived data

Dário E. Kalume, Suraj Peri, Raghunath Reddy, Jun Zhong, Mobolaji Okulate, Nirbhay Kumar, Akhilesh Pandey

Research output: Contribution to journalArticle

Abstract

Background: A large number of animal and plant genomes have been completely sequenced over the last decade and are now publicly available. Although genomes can be rapidly sequenced, identifying protein-coding genes still remains a problematic task. Availability of protein sequence data allows direct confirmation of protein-coding genes. Mass spectrometry has recently emerged as a powerful tool for proteomic studies. Protein identification using mass spectrometry is usually carried out by searching against databases of known proteins or transcripts. This approach generally does not allow identification of proteins that have not yet been predicted or whose transcripts have not been identified. Results: We searched 3,967 mass spectra from 16 LC-MS/MS runs of Anopheles gambiae salivary gland homogenates against the Anopheles gambiae genome database. This allowed us to validate 23 known transcripts and 50 novel transcripts. In addition, a novel gene was identified on the basis of peptides that matched a genomic region where no gene was known and no transcript had been predicted. The amino termini of proteins encoded by two predicted transcripts were confirmed based on N-terminally acetylated peptides sequenced by tandem mass spectrometry. Finally, six sequence polymorphisms could be annotated based on experimentally obtained peptide sequences. Conclusion: The peptide sequences from this study were mapped onto the genomic sequence using the distributed annotation system available at Ensembl and can be visualized in the context of all other existing annotations. The strategy described in this paper can be used to correct and confirm genome annotations and permit discovery of novel proteins in a high-throughput manner by mass spectrometry.

Original languageEnglish (US)
JournalBMC Genomics
Volume6
DOIs
StatePublished - Sep 19 2005

Fingerprint

Anopheles gambiae
Genome
Proteins
Mass Spectrometry
Peptides
Plant Genome
Computer Communication Networks
Protein Databases
Tandem Mass Spectrometry
Salivary Glands
Proteomics
Genes
Databases

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Kalume, D. E., Peri, S., Reddy, R., Zhong, J., Okulate, M., Kumar, N., & Pandey, A. (2005). Genome annotation of Anopheles gambiae using mass spectometry-derived data. BMC Genomics, 6. https://doi.org/10.1186/1471-2164-6-128

Genome annotation of Anopheles gambiae using mass spectometry-derived data. / Kalume, Dário E.; Peri, Suraj; Reddy, Raghunath; Zhong, Jun; Okulate, Mobolaji; Kumar, Nirbhay; Pandey, Akhilesh.

In: BMC Genomics, Vol. 6, 19.09.2005.

Research output: Contribution to journalArticle

Kalume, DE, Peri, S, Reddy, R, Zhong, J, Okulate, M, Kumar, N & Pandey, A 2005, 'Genome annotation of Anopheles gambiae using mass spectometry-derived data', BMC Genomics, vol. 6. https://doi.org/10.1186/1471-2164-6-128
Kalume, Dário E. ; Peri, Suraj ; Reddy, Raghunath ; Zhong, Jun ; Okulate, Mobolaji ; Kumar, Nirbhay ; Pandey, Akhilesh. / Genome annotation of Anopheles gambiae using mass spectometry-derived data. In: BMC Genomics. 2005 ; Vol. 6.
@article{f67421d2079c401e833f2917e5048460,
title = "Genome annotation of Anopheles gambiae using mass spectometry-derived data",
abstract = "Background: A large number of animal and plant genomes have been completely sequenced over the last decade and are now publicly available. Although genomes can be rapidly sequenced, identifying protein-coding genes still remains a problematic task. Availability of protein sequence data allows direct confirmation of protein-coding genes. Mass spectrometry has recently emerged as a powerful tool for proteomic studies. Protein identification using mass spectrometry is usually carried out by searching against databases of known proteins or transcripts. This approach generally does not allow identification of proteins that have not yet been predicted or whose transcripts have not been identified. Results: We searched 3,967 mass spectra from 16 LC-MS/MS runs of Anopheles gambiae salivary gland homogenates against the Anopheles gambiae genome database. This allowed us to validate 23 known transcripts and 50 novel transcripts. In addition, a novel gene was identified on the basis of peptides that matched a genomic region where no gene was known and no transcript had been predicted. The amino termini of proteins encoded by two predicted transcripts were confirmed based on N-terminally acetylated peptides sequenced by tandem mass spectrometry. Finally, six sequence polymorphisms could be annotated based on experimentally obtained peptide sequences. Conclusion: The peptide sequences from this study were mapped onto the genomic sequence using the distributed annotation system available at Ensembl and can be visualized in the context of all other existing annotations. The strategy described in this paper can be used to correct and confirm genome annotations and permit discovery of novel proteins in a high-throughput manner by mass spectrometry.",
author = "Kalume, {D{\'a}rio E.} and Suraj Peri and Raghunath Reddy and Jun Zhong and Mobolaji Okulate and Nirbhay Kumar and Akhilesh Pandey",
year = "2005",
month = "9",
day = "19",
doi = "10.1186/1471-2164-6-128",
language = "English (US)",
volume = "6",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Genome annotation of Anopheles gambiae using mass spectometry-derived data

AU - Kalume, Dário E.

AU - Peri, Suraj

AU - Reddy, Raghunath

AU - Zhong, Jun

AU - Okulate, Mobolaji

AU - Kumar, Nirbhay

AU - Pandey, Akhilesh

PY - 2005/9/19

Y1 - 2005/9/19

N2 - Background: A large number of animal and plant genomes have been completely sequenced over the last decade and are now publicly available. Although genomes can be rapidly sequenced, identifying protein-coding genes still remains a problematic task. Availability of protein sequence data allows direct confirmation of protein-coding genes. Mass spectrometry has recently emerged as a powerful tool for proteomic studies. Protein identification using mass spectrometry is usually carried out by searching against databases of known proteins or transcripts. This approach generally does not allow identification of proteins that have not yet been predicted or whose transcripts have not been identified. Results: We searched 3,967 mass spectra from 16 LC-MS/MS runs of Anopheles gambiae salivary gland homogenates against the Anopheles gambiae genome database. This allowed us to validate 23 known transcripts and 50 novel transcripts. In addition, a novel gene was identified on the basis of peptides that matched a genomic region where no gene was known and no transcript had been predicted. The amino termini of proteins encoded by two predicted transcripts were confirmed based on N-terminally acetylated peptides sequenced by tandem mass spectrometry. Finally, six sequence polymorphisms could be annotated based on experimentally obtained peptide sequences. Conclusion: The peptide sequences from this study were mapped onto the genomic sequence using the distributed annotation system available at Ensembl and can be visualized in the context of all other existing annotations. The strategy described in this paper can be used to correct and confirm genome annotations and permit discovery of novel proteins in a high-throughput manner by mass spectrometry.

AB - Background: A large number of animal and plant genomes have been completely sequenced over the last decade and are now publicly available. Although genomes can be rapidly sequenced, identifying protein-coding genes still remains a problematic task. Availability of protein sequence data allows direct confirmation of protein-coding genes. Mass spectrometry has recently emerged as a powerful tool for proteomic studies. Protein identification using mass spectrometry is usually carried out by searching against databases of known proteins or transcripts. This approach generally does not allow identification of proteins that have not yet been predicted or whose transcripts have not been identified. Results: We searched 3,967 mass spectra from 16 LC-MS/MS runs of Anopheles gambiae salivary gland homogenates against the Anopheles gambiae genome database. This allowed us to validate 23 known transcripts and 50 novel transcripts. In addition, a novel gene was identified on the basis of peptides that matched a genomic region where no gene was known and no transcript had been predicted. The amino termini of proteins encoded by two predicted transcripts were confirmed based on N-terminally acetylated peptides sequenced by tandem mass spectrometry. Finally, six sequence polymorphisms could be annotated based on experimentally obtained peptide sequences. Conclusion: The peptide sequences from this study were mapped onto the genomic sequence using the distributed annotation system available at Ensembl and can be visualized in the context of all other existing annotations. The strategy described in this paper can be used to correct and confirm genome annotations and permit discovery of novel proteins in a high-throughput manner by mass spectrometry.

UR - http://www.scopus.com/inward/record.url?scp=25444475024&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=25444475024&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-6-128

DO - 10.1186/1471-2164-6-128

M3 - Article

VL - 6

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

ER -