Domain size distributions can predict domain boundaries

S. J. Wheelan; A. Marchler-Bauer; S. H. Bryant

doi:10.1093/bioinformatics/16.7.613

Domain size distributions can predict domain boundaries

S. J. Wheelan, A. Marchler-Bauer, S. H. Bryant

Research output: Contribution to journal › Article › peer-review

150 Scopus citations

Abstract

Motivation: The sizes of protein domains observed in the 3D-structure database follow a surprisingly narrow distribution. Structural domains are furthermore formed from a single-chain continuous segment in over 80% of instances. These observations imply that some choices of domain boundaries on an otherwise uncharacterized sequence are more likely than others, based solely on the size and segment number of predicted domains. This property might be used to guess the locations of protein domain boundaries. Results: To test this possibility we enumerate putative domain boundaries and calculate their relative likelihood under a probability model that considers only the size and segment number of predicted domains. We ask, in a cross-validated test using sequences with known 3D structure, whether the most likely guesses agree with the observed domain structure. We find that domain boundary predictions are surprisingly successful for sequences up to 400 residues long and that guessing domain boundaries in this way can improve the sensitivity of threading analysis.

Original language	English (US)
Pages (from-to)	613-618
Number of pages	6
Journal	Bioinformatics
Volume	16
Issue number	7
DOIs	https://doi.org/10.1093/bioinformatics/16.7.613
State	Published - 2000
Externally published	Yes

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/16.7.613

Cite this

@article{0efdec890cb2440d87b99daec405a483,

title = "Domain size distributions can predict domain boundaries",

abstract = "Motivation: The sizes of protein domains observed in the 3D-structure database follow a surprisingly narrow distribution. Structural domains are furthermore formed from a single-chain continuous segment in over 80% of instances. These observations imply that some choices of domain boundaries on an otherwise uncharacterized sequence are more likely than others, based solely on the size and segment number of predicted domains. This property might be used to guess the locations of protein domain boundaries. Results: To test this possibility we enumerate putative domain boundaries and calculate their relative likelihood under a probability model that considers only the size and segment number of predicted domains. We ask, in a cross-validated test using sequences with known 3D structure, whether the most likely guesses agree with the observed domain structure. We find that domain boundary predictions are surprisingly successful for sequences up to 400 residues long and that guessing domain boundaries in this way can improve the sensitivity of threading analysis.",

author = "Wheelan, {S. J.} and A. Marchler-Bauer and Bryant, {S. H.}",

note = "Funding Information: The authors wish to thank Anna Panchenko for useful discussions and the NIH intramural research program for support.",

year = "2000",

doi = "10.1093/bioinformatics/16.7.613",

language = "English (US)",

volume = "16",

pages = "613--618",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "7",

}

TY - JOUR

T1 - Domain size distributions can predict domain boundaries

AU - Wheelan, S. J.

AU - Marchler-Bauer, A.

AU - Bryant, S. H.

N1 - Funding Information: The authors wish to thank Anna Panchenko for useful discussions and the NIH intramural research program for support.

PY - 2000

Y1 - 2000

N2 - Motivation: The sizes of protein domains observed in the 3D-structure database follow a surprisingly narrow distribution. Structural domains are furthermore formed from a single-chain continuous segment in over 80% of instances. These observations imply that some choices of domain boundaries on an otherwise uncharacterized sequence are more likely than others, based solely on the size and segment number of predicted domains. This property might be used to guess the locations of protein domain boundaries. Results: To test this possibility we enumerate putative domain boundaries and calculate their relative likelihood under a probability model that considers only the size and segment number of predicted domains. We ask, in a cross-validated test using sequences with known 3D structure, whether the most likely guesses agree with the observed domain structure. We find that domain boundary predictions are surprisingly successful for sequences up to 400 residues long and that guessing domain boundaries in this way can improve the sensitivity of threading analysis.

AB - Motivation: The sizes of protein domains observed in the 3D-structure database follow a surprisingly narrow distribution. Structural domains are furthermore formed from a single-chain continuous segment in over 80% of instances. These observations imply that some choices of domain boundaries on an otherwise uncharacterized sequence are more likely than others, based solely on the size and segment number of predicted domains. This property might be used to guess the locations of protein domain boundaries. Results: To test this possibility we enumerate putative domain boundaries and calculate their relative likelihood under a probability model that considers only the size and segment number of predicted domains. We ask, in a cross-validated test using sequences with known 3D structure, whether the most likely guesses agree with the observed domain structure. We find that domain boundary predictions are surprisingly successful for sequences up to 400 residues long and that guessing domain boundaries in this way can improve the sensitivity of threading analysis.

UR - http://www.scopus.com/inward/record.url?scp=0033753811&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033753811&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/16.7.613

DO - 10.1093/bioinformatics/16.7.613

M3 - Article

C2 - 11038331

AN - SCOPUS:0033753811

SN - 1367-4803

VL - 16

SP - 613

EP - 618

JO - Bioinformatics

JF - Bioinformatics

IS - 7

ER -

Domain size distributions can predict domain boundaries

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this