Abstract
We describe an information-theory-based measure of the quality of secondary structure prediction (RELINFO). RELINFO has a simple yet intuitive interpretation: it represents the factor by which secondary structure choice at a residue has been restricted by a prediction scheme. As an alternative interpretation of secondary structure prediction, RELINFO complements currently used methods by providing an information-based view as to why a prediction succeeds and fails. To demonstrate this score's capabilities, we applied RELINFO to an analysis of a large set of secondary structure predictions obtained from the first five rounds of the Critical Assessment of Structure Prediction (CASP) experiment. RELINFO is compared with two other common measures: percent correct (Q3) and secondary structure overlap (SOV). While the correlation between Q3 and RELINFO is approximately 0.85, RELINFO avoids certain disadvantages of Q3, including overestimating the quality of a prediction. The correlation between SOV and RELINFO is approximately 0.75. The valuable SOV measure unfortunately suffers from a saturation problem, and perhaps has unfairly given the general impression that secondary structure prediction has reached its limit since SOV hasn't improved much over the recent rounds of CASP. Although not a replacement for SOV, RELINFO has greater dispersion. Over the five rounds of CASP assessed here, RELINFO shows that predictions targets have been more difficult in successive CASP experiments, yet the predictions quality has continued to improve measurably over each round. In terms of information, the secondary structure prediction quality has almost doubled from CASP1 to CASP5. Therefore, as a different perspective of accuracy, RELINFO can help to improve prediction of protein secondary structure by providing a measure of difficulty as well as final quality of a prediction.
Original language | English (US) |
---|---|
Pages (from-to) | 65-79 |
Number of pages | 15 |
Journal | Journal of Computational Biology |
Volume | 15 |
Issue number | 1 |
DOIs | |
State | Published - Jan 1 2008 |
Externally published | Yes |
Fingerprint
Keywords
- Bits
- Choice
- Effective number of choices
- Entropy
- Intuitive meaning
- Mutual information
- Percent correct
- Q3
- SOV
ASJC Scopus subject areas
- Molecular Biology
- Genetics
Cite this
An information measure of the quality of protein secondary structure prediction. / Swanson, Rosemarie; Kagiampakis, Ioannis; Tsai, Jerry W.
In: Journal of Computational Biology, Vol. 15, No. 1, 01.01.2008, p. 65-79.Research output: Contribution to journal › Article
}
TY - JOUR
T1 - An information measure of the quality of protein secondary structure prediction
AU - Swanson, Rosemarie
AU - Kagiampakis, Ioannis
AU - Tsai, Jerry W.
PY - 2008/1/1
Y1 - 2008/1/1
N2 - We describe an information-theory-based measure of the quality of secondary structure prediction (RELINFO). RELINFO has a simple yet intuitive interpretation: it represents the factor by which secondary structure choice at a residue has been restricted by a prediction scheme. As an alternative interpretation of secondary structure prediction, RELINFO complements currently used methods by providing an information-based view as to why a prediction succeeds and fails. To demonstrate this score's capabilities, we applied RELINFO to an analysis of a large set of secondary structure predictions obtained from the first five rounds of the Critical Assessment of Structure Prediction (CASP) experiment. RELINFO is compared with two other common measures: percent correct (Q3) and secondary structure overlap (SOV). While the correlation between Q3 and RELINFO is approximately 0.85, RELINFO avoids certain disadvantages of Q3, including overestimating the quality of a prediction. The correlation between SOV and RELINFO is approximately 0.75. The valuable SOV measure unfortunately suffers from a saturation problem, and perhaps has unfairly given the general impression that secondary structure prediction has reached its limit since SOV hasn't improved much over the recent rounds of CASP. Although not a replacement for SOV, RELINFO has greater dispersion. Over the five rounds of CASP assessed here, RELINFO shows that predictions targets have been more difficult in successive CASP experiments, yet the predictions quality has continued to improve measurably over each round. In terms of information, the secondary structure prediction quality has almost doubled from CASP1 to CASP5. Therefore, as a different perspective of accuracy, RELINFO can help to improve prediction of protein secondary structure by providing a measure of difficulty as well as final quality of a prediction.
AB - We describe an information-theory-based measure of the quality of secondary structure prediction (RELINFO). RELINFO has a simple yet intuitive interpretation: it represents the factor by which secondary structure choice at a residue has been restricted by a prediction scheme. As an alternative interpretation of secondary structure prediction, RELINFO complements currently used methods by providing an information-based view as to why a prediction succeeds and fails. To demonstrate this score's capabilities, we applied RELINFO to an analysis of a large set of secondary structure predictions obtained from the first five rounds of the Critical Assessment of Structure Prediction (CASP) experiment. RELINFO is compared with two other common measures: percent correct (Q3) and secondary structure overlap (SOV). While the correlation between Q3 and RELINFO is approximately 0.85, RELINFO avoids certain disadvantages of Q3, including overestimating the quality of a prediction. The correlation between SOV and RELINFO is approximately 0.75. The valuable SOV measure unfortunately suffers from a saturation problem, and perhaps has unfairly given the general impression that secondary structure prediction has reached its limit since SOV hasn't improved much over the recent rounds of CASP. Although not a replacement for SOV, RELINFO has greater dispersion. Over the five rounds of CASP assessed here, RELINFO shows that predictions targets have been more difficult in successive CASP experiments, yet the predictions quality has continued to improve measurably over each round. In terms of information, the secondary structure prediction quality has almost doubled from CASP1 to CASP5. Therefore, as a different perspective of accuracy, RELINFO can help to improve prediction of protein secondary structure by providing a measure of difficulty as well as final quality of a prediction.
KW - Bits
KW - Choice
KW - Effective number of choices
KW - Entropy
KW - Intuitive meaning
KW - Mutual information
KW - Percent correct
KW - Q3
KW - SOV
UR - http://www.scopus.com/inward/record.url?scp=39449102990&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=39449102990&partnerID=8YFLogxK
U2 - 10.1089/cmb.2007.0199
DO - 10.1089/cmb.2007.0199
M3 - Article
C2 - 18199024
AN - SCOPUS:39449102990
VL - 15
SP - 65
EP - 79
JO - Journal of Computational Biology
JF - Journal of Computational Biology
SN - 1066-5277
IS - 1
ER -