MASCOT HTML and XML parser: An implementation of a novel object model for protein identification data

Chunguang G. Yang; Stephen J. Granite; Jennifer E. Van Eyk; Raimond L. Winslow

doi:10.1002/pmic.200600157

MASCOT HTML and XML parser: An implementation of a novel object model for protein identification data

Chunguang G. Yang, Stephen J. Granite, Jennifer E. Van Eyk, Raimond L. Winslow

School of Medicine

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http:// www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.

Original language	English (US)
Pages (from-to)	5688-5693
Number of pages	6
Journal	Proteomics
Volume	6
Issue number	21
DOIs	https://doi.org/10.1002/pmic.200600157
State	Published - Nov 2006

Keywords

HTML parser
Java
MASCOT parser
Protein identification data object model
XML parser

ASJC Scopus subject areas

Biochemistry
Molecular Biology

Access to Document

10.1002/pmic.200600157

Cite this

@article{86800fb65eb34390a54ffc23ae26bda1,

title = "MASCOT HTML and XML parser: An implementation of a novel object model for protein identification data",

abstract = "Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http:// www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.",

keywords = "HTML parser, Java, MASCOT parser, Protein identification data object model, XML parser",

author = "Yang, {Chunguang G.} and Granite, {Stephen J.} and {Van Eyk}, {Jennifer E.} and Winslow, {Raimond L.}",

year = "2006",

month = nov,

doi = "10.1002/pmic.200600157",

language = "English (US)",

volume = "6",

pages = "5688--5693",

journal = "Proteomics",

issn = "1615-9853",

publisher = "Wiley-VCH Verlag",

number = "21",

}

TY - JOUR

T1 - MASCOT HTML and XML parser

T2 - An implementation of a novel object model for protein identification data

AU - Yang, Chunguang G.

AU - Granite, Stephen J.

AU - Van Eyk, Jennifer E.

AU - Winslow, Raimond L.

PY - 2006/11

Y1 - 2006/11

N2 - Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http:// www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.

AB - Protein identification using MS is an important technique in proteomics as well as a major generator of proteomics data. We have designed the protein identification data object model (PDOM) and developed a parser based on this model to facilitate the analysis and storage of these data. The parser works with HTML or XML files saved or exported from MASCOT MS/MS ions search in peptide summary report or MASCOT PMF search in protein summary report. The program creates PDOM objects, eliminates redundancy in the input file, and has the capability to output any PDOM object to a relational database. This program facilitates additional analysis of MASCOT search results and aids the storage of protein identification information. The implementation is extensible and can serve as a template to develop parsers for other search engines. The parser can be used as a stand-alone application or can be driven by other Java programs. It is currently being used as the front end for a system that loads HTML and XML result files of MASCOT searches into a relational database. The source code is freely available at http:// www.ccbm.jhu.edu and the program uses only free and open-source Java libraries.

KW - HTML parser

KW - Java

KW - MASCOT parser

KW - Protein identification data object model

KW - XML parser

UR - http://www.scopus.com/inward/record.url?scp=33751077285&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33751077285&partnerID=8YFLogxK

U2 - 10.1002/pmic.200600157

DO - 10.1002/pmic.200600157

M3 - Article

C2 - 17006878

AN - SCOPUS:33751077285

SN - 1615-9853

VL - 6

SP - 5688

EP - 5693

JO - Proteomics

JF - Proteomics

IS - 21

ER -

MASCOT HTML and XML parser: An implementation of a novel object model for protein identification data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this