Data maximization by multipass analysis of protein mass spectra

Ravi Tharakan, Nathan Edwards, David Graham

Research output: Contribution to journalArticle

Abstract

With the proliferation of search engines for the analysis of MS data, multisearch techniques aimed at boosting the discriminating power of the search engines' score functions have recently become popular. Much statistical and algorithmic work has been done, therefore, in order to be able to combine and parse multiple search streams. However, multisearch techniques suffer from long run times, and may have little impact on false negatives because of similar peptide filtering heuristics between searches. This review focuses, rather, on multipass techniques, which use the results of one search to guide the selection of spectra, parameters and sequences in subsequent searches. This reduces the number of false-negative peptide identifications due to peptide candidate filtering while preserving statistical significance of existing (correct) identifi-cations. Furthermore, this technique avoids substantial increases in running time and, by limiting the search space, does not reduce the statistical significance of correct identifications or introduce a statistically significant number of false-positive identifications. However, we argue that the existing combiner tools are not reliably applicable to these multipass situations, because of algorithmic assumptions about search space and statistical assumptions about the rate of true positives. Here we provide an overview of the advantages of and issues in multipass analysis techniques, the existing methods and workflows available to proteomic researchers, and the unsolved statistical and algorithmic issues amenable to future research.

Original languageEnglish (US)
Pages (from-to)1160-1171
Number of pages12
JournalProteomics
Volume10
Issue number6
DOIs
StatePublished - Mar 2010

Fingerprint

Search Engine
Mass Spectrometry
Search engines
Peptides
Proteins
Workflow
Proteomics
Cations
Research Personnel
Power (Psychology)
Heuristics

Keywords

  • Bioinformatics
  • Database search
  • Protein

ASJC Scopus subject areas

  • Molecular Biology
  • Biochemistry

Cite this

Data maximization by multipass analysis of protein mass spectra. / Tharakan, Ravi; Edwards, Nathan; Graham, David.

In: Proteomics, Vol. 10, No. 6, 03.2010, p. 1160-1171.

Research output: Contribution to journalArticle

Tharakan, Ravi ; Edwards, Nathan ; Graham, David. / Data maximization by multipass analysis of protein mass spectra. In: Proteomics. 2010 ; Vol. 10, No. 6. pp. 1160-1171.
@article{2019f5ea12c240e3b8f783be15034e02,
title = "Data maximization by multipass analysis of protein mass spectra",
abstract = "With the proliferation of search engines for the analysis of MS data, multisearch techniques aimed at boosting the discriminating power of the search engines' score functions have recently become popular. Much statistical and algorithmic work has been done, therefore, in order to be able to combine and parse multiple search streams. However, multisearch techniques suffer from long run times, and may have little impact on false negatives because of similar peptide filtering heuristics between searches. This review focuses, rather, on multipass techniques, which use the results of one search to guide the selection of spectra, parameters and sequences in subsequent searches. This reduces the number of false-negative peptide identifications due to peptide candidate filtering while preserving statistical significance of existing (correct) identifi-cations. Furthermore, this technique avoids substantial increases in running time and, by limiting the search space, does not reduce the statistical significance of correct identifications or introduce a statistically significant number of false-positive identifications. However, we argue that the existing combiner tools are not reliably applicable to these multipass situations, because of algorithmic assumptions about search space and statistical assumptions about the rate of true positives. Here we provide an overview of the advantages of and issues in multipass analysis techniques, the existing methods and workflows available to proteomic researchers, and the unsolved statistical and algorithmic issues amenable to future research.",
keywords = "Bioinformatics, Database search, Protein",
author = "Ravi Tharakan and Nathan Edwards and David Graham",
year = "2010",
month = "3",
doi = "10.1002/pmic.200900433",
language = "English (US)",
volume = "10",
pages = "1160--1171",
journal = "Proteomics",
issn = "1615-9853",
publisher = "Wiley-VCH Verlag",
number = "6",

}

TY - JOUR

T1 - Data maximization by multipass analysis of protein mass spectra

AU - Tharakan, Ravi

AU - Edwards, Nathan

AU - Graham, David

PY - 2010/3

Y1 - 2010/3

N2 - With the proliferation of search engines for the analysis of MS data, multisearch techniques aimed at boosting the discriminating power of the search engines' score functions have recently become popular. Much statistical and algorithmic work has been done, therefore, in order to be able to combine and parse multiple search streams. However, multisearch techniques suffer from long run times, and may have little impact on false negatives because of similar peptide filtering heuristics between searches. This review focuses, rather, on multipass techniques, which use the results of one search to guide the selection of spectra, parameters and sequences in subsequent searches. This reduces the number of false-negative peptide identifications due to peptide candidate filtering while preserving statistical significance of existing (correct) identifi-cations. Furthermore, this technique avoids substantial increases in running time and, by limiting the search space, does not reduce the statistical significance of correct identifications or introduce a statistically significant number of false-positive identifications. However, we argue that the existing combiner tools are not reliably applicable to these multipass situations, because of algorithmic assumptions about search space and statistical assumptions about the rate of true positives. Here we provide an overview of the advantages of and issues in multipass analysis techniques, the existing methods and workflows available to proteomic researchers, and the unsolved statistical and algorithmic issues amenable to future research.

AB - With the proliferation of search engines for the analysis of MS data, multisearch techniques aimed at boosting the discriminating power of the search engines' score functions have recently become popular. Much statistical and algorithmic work has been done, therefore, in order to be able to combine and parse multiple search streams. However, multisearch techniques suffer from long run times, and may have little impact on false negatives because of similar peptide filtering heuristics between searches. This review focuses, rather, on multipass techniques, which use the results of one search to guide the selection of spectra, parameters and sequences in subsequent searches. This reduces the number of false-negative peptide identifications due to peptide candidate filtering while preserving statistical significance of existing (correct) identifi-cations. Furthermore, this technique avoids substantial increases in running time and, by limiting the search space, does not reduce the statistical significance of correct identifications or introduce a statistically significant number of false-positive identifications. However, we argue that the existing combiner tools are not reliably applicable to these multipass situations, because of algorithmic assumptions about search space and statistical assumptions about the rate of true positives. Here we provide an overview of the advantages of and issues in multipass analysis techniques, the existing methods and workflows available to proteomic researchers, and the unsolved statistical and algorithmic issues amenable to future research.

KW - Bioinformatics

KW - Database search

KW - Protein

UR - http://www.scopus.com/inward/record.url?scp=77949752005&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77949752005&partnerID=8YFLogxK

U2 - 10.1002/pmic.200900433

DO - 10.1002/pmic.200900433

M3 - Article

C2 - 20082346

AN - SCOPUS:77949752005

VL - 10

SP - 1160

EP - 1171

JO - Proteomics

JF - Proteomics

SN - 1615-9853

IS - 6

ER -