An information measure of the quality of protein secondary structure prediction

Rosemarie Swanson, Ioannis Kagiampakis, Jerry W. Tsai

Research output: Contribution to journalArticle

Abstract

We describe an information-theory-based measure of the quality of secondary structure prediction (RELINFO). RELINFO has a simple yet intuitive interpretation: it represents the factor by which secondary structure choice at a residue has been restricted by a prediction scheme. As an alternative interpretation of secondary structure prediction, RELINFO complements currently used methods by providing an information-based view as to why a prediction succeeds and fails. To demonstrate this score's capabilities, we applied RELINFO to an analysis of a large set of secondary structure predictions obtained from the first five rounds of the Critical Assessment of Structure Prediction (CASP) experiment. RELINFO is compared with two other common measures: percent correct (Q3) and secondary structure overlap (SOV). While the correlation between Q3 and RELINFO is approximately 0.85, RELINFO avoids certain disadvantages of Q3, including overestimating the quality of a prediction. The correlation between SOV and RELINFO is approximately 0.75. The valuable SOV measure unfortunately suffers from a saturation problem, and perhaps has unfairly given the general impression that secondary structure prediction has reached its limit since SOV hasn't improved much over the recent rounds of CASP. Although not a replacement for SOV, RELINFO has greater dispersion. Over the five rounds of CASP assessed here, RELINFO shows that predictions targets have been more difficult in successive CASP experiments, yet the predictions quality has continued to improve measurably over each round. In terms of information, the secondary structure prediction quality has almost doubled from CASP1 to CASP5. Therefore, as a different perspective of accuracy, RELINFO can help to improve prediction of protein secondary structure by providing a measure of difficulty as well as final quality of a prediction.

Original languageEnglish (US)
Pages (from-to)65-79
Number of pages15
JournalJournal of Computational Biology
Volume15
Issue number1
DOIs
StatePublished - Jan 1 2008
Externally publishedYes

Fingerprint

Information Theory
Secondary Protein Structure
Measures of Information
Structure Prediction
Protein Structure
Secondary Structure
Proteins
Overlap
Prediction
Large Set
Percent
Replacement
Experiment
Saturation
Intuitive
Complement
Information theory

Keywords

  • Bits
  • Choice
  • Effective number of choices
  • Entropy
  • Intuitive meaning
  • Mutual information
  • Percent correct
  • Q3
  • SOV

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics

Cite this

An information measure of the quality of protein secondary structure prediction. / Swanson, Rosemarie; Kagiampakis, Ioannis; Tsai, Jerry W.

In: Journal of Computational Biology, Vol. 15, No. 1, 01.01.2008, p. 65-79.

Research output: Contribution to journalArticle

Swanson, Rosemarie ; Kagiampakis, Ioannis ; Tsai, Jerry W. / An information measure of the quality of protein secondary structure prediction. In: Journal of Computational Biology. 2008 ; Vol. 15, No. 1. pp. 65-79.
@article{03c39baa0de7416c8737ec7f8e378818,
title = "An information measure of the quality of protein secondary structure prediction",
abstract = "We describe an information-theory-based measure of the quality of secondary structure prediction (RELINFO). RELINFO has a simple yet intuitive interpretation: it represents the factor by which secondary structure choice at a residue has been restricted by a prediction scheme. As an alternative interpretation of secondary structure prediction, RELINFO complements currently used methods by providing an information-based view as to why a prediction succeeds and fails. To demonstrate this score's capabilities, we applied RELINFO to an analysis of a large set of secondary structure predictions obtained from the first five rounds of the Critical Assessment of Structure Prediction (CASP) experiment. RELINFO is compared with two other common measures: percent correct (Q3) and secondary structure overlap (SOV). While the correlation between Q3 and RELINFO is approximately 0.85, RELINFO avoids certain disadvantages of Q3, including overestimating the quality of a prediction. The correlation between SOV and RELINFO is approximately 0.75. The valuable SOV measure unfortunately suffers from a saturation problem, and perhaps has unfairly given the general impression that secondary structure prediction has reached its limit since SOV hasn't improved much over the recent rounds of CASP. Although not a replacement for SOV, RELINFO has greater dispersion. Over the five rounds of CASP assessed here, RELINFO shows that predictions targets have been more difficult in successive CASP experiments, yet the predictions quality has continued to improve measurably over each round. In terms of information, the secondary structure prediction quality has almost doubled from CASP1 to CASP5. Therefore, as a different perspective of accuracy, RELINFO can help to improve prediction of protein secondary structure by providing a measure of difficulty as well as final quality of a prediction.",
keywords = "Bits, Choice, Effective number of choices, Entropy, Intuitive meaning, Mutual information, Percent correct, Q3, SOV",
author = "Rosemarie Swanson and Ioannis Kagiampakis and Tsai, {Jerry W.}",
year = "2008",
month = "1",
day = "1",
doi = "10.1089/cmb.2007.0199",
language = "English (US)",
volume = "15",
pages = "65--79",
journal = "Journal of Computational Biology",
issn = "1066-5277",
publisher = "Mary Ann Liebert Inc.",
number = "1",

}

TY - JOUR

T1 - An information measure of the quality of protein secondary structure prediction

AU - Swanson, Rosemarie

AU - Kagiampakis, Ioannis

AU - Tsai, Jerry W.

PY - 2008/1/1

Y1 - 2008/1/1

N2 - We describe an information-theory-based measure of the quality of secondary structure prediction (RELINFO). RELINFO has a simple yet intuitive interpretation: it represents the factor by which secondary structure choice at a residue has been restricted by a prediction scheme. As an alternative interpretation of secondary structure prediction, RELINFO complements currently used methods by providing an information-based view as to why a prediction succeeds and fails. To demonstrate this score's capabilities, we applied RELINFO to an analysis of a large set of secondary structure predictions obtained from the first five rounds of the Critical Assessment of Structure Prediction (CASP) experiment. RELINFO is compared with two other common measures: percent correct (Q3) and secondary structure overlap (SOV). While the correlation between Q3 and RELINFO is approximately 0.85, RELINFO avoids certain disadvantages of Q3, including overestimating the quality of a prediction. The correlation between SOV and RELINFO is approximately 0.75. The valuable SOV measure unfortunately suffers from a saturation problem, and perhaps has unfairly given the general impression that secondary structure prediction has reached its limit since SOV hasn't improved much over the recent rounds of CASP. Although not a replacement for SOV, RELINFO has greater dispersion. Over the five rounds of CASP assessed here, RELINFO shows that predictions targets have been more difficult in successive CASP experiments, yet the predictions quality has continued to improve measurably over each round. In terms of information, the secondary structure prediction quality has almost doubled from CASP1 to CASP5. Therefore, as a different perspective of accuracy, RELINFO can help to improve prediction of protein secondary structure by providing a measure of difficulty as well as final quality of a prediction.

AB - We describe an information-theory-based measure of the quality of secondary structure prediction (RELINFO). RELINFO has a simple yet intuitive interpretation: it represents the factor by which secondary structure choice at a residue has been restricted by a prediction scheme. As an alternative interpretation of secondary structure prediction, RELINFO complements currently used methods by providing an information-based view as to why a prediction succeeds and fails. To demonstrate this score's capabilities, we applied RELINFO to an analysis of a large set of secondary structure predictions obtained from the first five rounds of the Critical Assessment of Structure Prediction (CASP) experiment. RELINFO is compared with two other common measures: percent correct (Q3) and secondary structure overlap (SOV). While the correlation between Q3 and RELINFO is approximately 0.85, RELINFO avoids certain disadvantages of Q3, including overestimating the quality of a prediction. The correlation between SOV and RELINFO is approximately 0.75. The valuable SOV measure unfortunately suffers from a saturation problem, and perhaps has unfairly given the general impression that secondary structure prediction has reached its limit since SOV hasn't improved much over the recent rounds of CASP. Although not a replacement for SOV, RELINFO has greater dispersion. Over the five rounds of CASP assessed here, RELINFO shows that predictions targets have been more difficult in successive CASP experiments, yet the predictions quality has continued to improve measurably over each round. In terms of information, the secondary structure prediction quality has almost doubled from CASP1 to CASP5. Therefore, as a different perspective of accuracy, RELINFO can help to improve prediction of protein secondary structure by providing a measure of difficulty as well as final quality of a prediction.

KW - Bits

KW - Choice

KW - Effective number of choices

KW - Entropy

KW - Intuitive meaning

KW - Mutual information

KW - Percent correct

KW - Q3

KW - SOV

UR - http://www.scopus.com/inward/record.url?scp=39449102990&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=39449102990&partnerID=8YFLogxK

U2 - 10.1089/cmb.2007.0199

DO - 10.1089/cmb.2007.0199

M3 - Article

C2 - 18199024

AN - SCOPUS:39449102990

VL - 15

SP - 65

EP - 79

JO - Journal of Computational Biology

JF - Journal of Computational Biology

SN - 1066-5277

IS - 1

ER -