Abstract
Motivation: N-linked glycosylation occurs predominantly at the N-X-T/S motif, where X is any amino acid except proline. Not all N-X-T/S sequons are glycosylated, and a number of web servers for predicting N-linked glycan occupancy using sequence and/or residue pattern information have been developed. None of the currently available servers, however, utilizes protein structural information for the prediction of N-glycan occupancy.Results: Here, we describe a novel classifier algorithm, NGlycPred, for the prediction of glycan occupancy at the N-X-T/S sequons. The algorithm utilizes both structural as well as residue pattern information and was trained on a set of glycosylated protein structures using the Random Forest algorithm. The best predictor achieved a balanced accuracy of 0.687 under 10-fold cross-validation on a curated dataset of 479 N-X-T/S sequons and outperformed sequence-based predictors when evaluated on the same dataset. The incorporation of structural information, including local contact order, surface accessibility/composition and secondary structure thus improves the prediction accuracy of glycan occupancy at the N-X-T/S consensus sequon.
Original language | English (US) |
---|---|
Article number | bts426 |
Pages (from-to) | 2249-2255 |
Number of pages | 7 |
Journal | Bioinformatics |
Volume | 28 |
Issue number | 17 |
DOIs | |
State | Published - Sep 2012 |
Externally published | Yes |
ASJC Scopus subject areas
- Biochemistry
- Molecular Biology
- Computational Theory and Mathematics
- Computer Science Applications
- Computational Mathematics
- Statistics and Probability
- General Medicine