MASCOT HTML and XML parser: An implementation of a novel object model for protein identification data

Chunguang G. Yang, Stephen J. Granite, Jennifer E. Van Eyk, Raimond Winslow

Research output: Contribution to journalArticle

Abstract

Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http:// www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.

Original languageEnglish (US)
Pages (from-to)5688-5693
Number of pages6
JournalProteomics
Volume6
Issue number21
DOIs
StatePublished - Nov 2006

Fingerprint

HTML
XML
Identification (control systems)
Proteins
Proteomics
Databases
Search Engine
Information Storage and Retrieval
Search engines
Libraries
Redundancy
Ions
Peptides

Keywords

  • HTML parser
  • Java
  • MASCOT parser
  • Protein identification data object model
  • XML parser

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics

Cite this

MASCOT HTML and XML parser : An implementation of a novel object model for protein identification data. / Yang, Chunguang G.; Granite, Stephen J.; Van Eyk, Jennifer E.; Winslow, Raimond.

In: Proteomics, Vol. 6, No. 21, 11.2006, p. 5688-5693.

Research output: Contribution to journalArticle

Yang, Chunguang G. ; Granite, Stephen J. ; Van Eyk, Jennifer E. ; Winslow, Raimond. / MASCOT HTML and XML parser : An implementation of a novel object model for protein identification data. In: Proteomics. 2006 ; Vol. 6, No. 21. pp. 5688-5693.
@article{86800fb65eb34390a54ffc23ae26bda1,
title = "MASCOT HTML and XML parser: An implementation of a novel object model for protein identification data",
abstract = "Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http:// www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.",
keywords = "HTML parser, Java, MASCOT parser, Protein identification data object model, XML parser",
author = "Yang, {Chunguang G.} and Granite, {Stephen J.} and {Van Eyk}, {Jennifer E.} and Raimond Winslow",
year = "2006",
month = "11",
doi = "10.1002/pmic.200600157",
language = "English (US)",
volume = "6",
pages = "5688--5693",
journal = "Proteomics",
issn = "1615-9853",
publisher = "Wiley-VCH Verlag",
number = "21",

}

TY - JOUR

T1 - MASCOT HTML and XML parser

T2 - An implementation of a novel object model for protein identification data

AU - Yang, Chunguang G.

AU - Granite, Stephen J.

AU - Van Eyk, Jennifer E.

AU - Winslow, Raimond

PY - 2006/11

Y1 - 2006/11

N2 - Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http:// www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.

AB - Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http:// www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.

KW - HTML parser

KW - Java

KW - MASCOT parser

KW - Protein identification data object model

KW - XML parser

UR - http://www.scopus.com/inward/record.url?scp=33751077285&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33751077285&partnerID=8YFLogxK

U2 - 10.1002/pmic.200600157

DO - 10.1002/pmic.200600157

M3 - Article

C2 - 17006878

AN - SCOPUS:33751077285

VL - 6

SP - 5688

EP - 5693

JO - Proteomics

JF - Proteomics

SN - 1615-9853

IS - 21

ER -