TY - JOUR
T1 - The dominance of the population by a selected few
T2 - power-law behaviour applies to a wide variety of genomic properties.
AU - Luscombe, Nicholas M.
AU - Qian, Jiang
AU - Zhang, Zhaolei
AU - Johnson, Ted
AU - Gerstein, Mark
PY - 2002/7/25
Y1 - 2002/7/25
N2 - BACKGROUND: The sequencing of genomes provides us with an inventory of the 'molecular parts' in nature, such as protein families and folds, and their functions in living organisms. Through the analysis of such inventories, it has been shown that different genomes have very different usage of parts; for example, the common folds in the worm are very different from those in Escherichia coli. RESULTS: Despite these differences, we find that the genomic occurrence of generalized parts follows a well-known mathematical framework called the power law, with a few parts occurring many times and most occurring only a few times. This observation is true in a wide variety of genomic contexts. Earlier studies found power laws in a few specific cases, such as the occurrence of protein families. Here, we find many further cases of power-law behavior, for example in the occurrence of pseudogenes and in levels of gene expression. We show comprehensively that this behavior applies across many different genomes, for many different types of parts (DNA words, InterPro families, protein superfamilies and folds, pseudogene families and pseudomotifs), and for the many disparate attributes associated with these parts (their functions, interactions and expression levels). CONCLUSIONS: Power-law behavior provides a concise mathematical description of an important biological feature: the sheer dominance of a few members over the overall population. We present this behavior in a unified framework and propose that all these observations are connected to an underlying DNA duplication process as genomes evolved to their current state.
AB - BACKGROUND: The sequencing of genomes provides us with an inventory of the 'molecular parts' in nature, such as protein families and folds, and their functions in living organisms. Through the analysis of such inventories, it has been shown that different genomes have very different usage of parts; for example, the common folds in the worm are very different from those in Escherichia coli. RESULTS: Despite these differences, we find that the genomic occurrence of generalized parts follows a well-known mathematical framework called the power law, with a few parts occurring many times and most occurring only a few times. This observation is true in a wide variety of genomic contexts. Earlier studies found power laws in a few specific cases, such as the occurrence of protein families. Here, we find many further cases of power-law behavior, for example in the occurrence of pseudogenes and in levels of gene expression. We show comprehensively that this behavior applies across many different genomes, for many different types of parts (DNA words, InterPro families, protein superfamilies and folds, pseudogene families and pseudomotifs), and for the many disparate attributes associated with these parts (their functions, interactions and expression levels). CONCLUSIONS: Power-law behavior provides a concise mathematical description of an important biological feature: the sheer dominance of a few members over the overall population. We present this behavior in a unified framework and propose that all these observations are connected to an underlying DNA duplication process as genomes evolved to their current state.
UR - http://www.scopus.com/inward/record.url?scp=0242713700&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0242713700&partnerID=8YFLogxK
M3 - Article
C2 - 12186647
AN - SCOPUS:0242713700
SN - 1474-7596
VL - 3
SP - RESEARCH0040
JO - Genome biology
JF - Genome biology
IS - 8
ER -