Triplet repeat length bias and variation in the human transcriptome

Michael Molla, Arthur Delcher, Shamil Sunyaev, Charles Cantor, Simon Kasif

Research output: Contribution to journalArticle

Abstract

Length variation in short tandem repeats (STRs) is an important family of DNA polymorphisms with numerous applications in genetics, medicine, forensics, and evolutionary analysis. Several major diseases have been associated with length variation of trinucleotide (triplet) repeats including Huntington's disease, hereditary ataxias and spinobulbar muscular atrophy. Using the reference human genome, we have catalogued all triplet repeats in genic regions. This data revealed a bias in noncoding DNA repeat lengths. It also enabled a survey of repeat-length polymorphisms (RLPs) in human genomes and a comparison of the rate of polymorphism in humans versus divergence from chimpanzee. For short repeats, this analysis of three human genomes reveals a relatively low RLP rate in exons and, somewhat surprisingly, in introns. All short RLPs observed in multiple genomes are biallelic (at least in this small sample). In contrast, long repeats are highly polymorphic and some long RLPs are multiallelic. For long repeats, the chimpanzee sequence frequently differs from all observed human alleles. This suggests a high expansion/contraction rate in all long repeats. Expansions and contractions are not, however, affected by natural selection discernable from our comparison of human-chimpanzee divergence with human RLPs. Our catalog of human triplet repeats and their surrounding flanking regions can be used to produce a cost-effective whole-genome assay to test individuals. This repeat assay could someday complement SNP arrays for producing tests that assess the risk of an individual to develop a disease, or become part of personalized genomic strategy that provides therapeutic guidance with respect to drug response.

Original languageEnglish (US)
Pages (from-to)17095-17100
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Volume106
Issue number40
DOIs
StatePublished - Oct 6 2009
Externally publishedYes

Fingerprint

Trinucleotide Repeats
Transcriptome
Pan troglodytes
Human Genome
Forensic Genetics
Atrophic Muscular Disorders
Genome
Spinocerebellar Degenerations
Forensic Medicine
Genetic Selection
DNA
Huntington Disease
Microsatellite Repeats
Introns
Single Nucleotide Polymorphism
Exons
Alleles
Costs and Cost Analysis
Pharmaceutical Preparations

Keywords

  • Computational biology
  • Genome
  • Genomics
  • Polymorphisms
  • Tandem repeats

ASJC Scopus subject areas

  • General

Cite this

Triplet repeat length bias and variation in the human transcriptome. / Molla, Michael; Delcher, Arthur; Sunyaev, Shamil; Cantor, Charles; Kasif, Simon.

In: Proceedings of the National Academy of Sciences of the United States of America, Vol. 106, No. 40, 06.10.2009, p. 17095-17100.

Research output: Contribution to journalArticle

Molla, Michael ; Delcher, Arthur ; Sunyaev, Shamil ; Cantor, Charles ; Kasif, Simon. / Triplet repeat length bias and variation in the human transcriptome. In: Proceedings of the National Academy of Sciences of the United States of America. 2009 ; Vol. 106, No. 40. pp. 17095-17100.
@article{9e751609b6c24e8d8e0f712b8141ac66,
title = "Triplet repeat length bias and variation in the human transcriptome",
abstract = "Length variation in short tandem repeats (STRs) is an important family of DNA polymorphisms with numerous applications in genetics, medicine, forensics, and evolutionary analysis. Several major diseases have been associated with length variation of trinucleotide (triplet) repeats including Huntington's disease, hereditary ataxias and spinobulbar muscular atrophy. Using the reference human genome, we have catalogued all triplet repeats in genic regions. This data revealed a bias in noncoding DNA repeat lengths. It also enabled a survey of repeat-length polymorphisms (RLPs) in human genomes and a comparison of the rate of polymorphism in humans versus divergence from chimpanzee. For short repeats, this analysis of three human genomes reveals a relatively low RLP rate in exons and, somewhat surprisingly, in introns. All short RLPs observed in multiple genomes are biallelic (at least in this small sample). In contrast, long repeats are highly polymorphic and some long RLPs are multiallelic. For long repeats, the chimpanzee sequence frequently differs from all observed human alleles. This suggests a high expansion/contraction rate in all long repeats. Expansions and contractions are not, however, affected by natural selection discernable from our comparison of human-chimpanzee divergence with human RLPs. Our catalog of human triplet repeats and their surrounding flanking regions can be used to produce a cost-effective whole-genome assay to test individuals. This repeat assay could someday complement SNP arrays for producing tests that assess the risk of an individual to develop a disease, or become part of personalized genomic strategy that provides therapeutic guidance with respect to drug response.",
keywords = "Computational biology, Genome, Genomics, Polymorphisms, Tandem repeats",
author = "Michael Molla and Arthur Delcher and Shamil Sunyaev and Charles Cantor and Simon Kasif",
year = "2009",
month = "10",
day = "6",
doi = "10.1073/pnas.0907112106",
language = "English (US)",
volume = "106",
pages = "17095--17100",
journal = "Proceedings of the National Academy of Sciences of the United States of America",
issn = "0027-8424",
number = "40",

}

TY - JOUR

T1 - Triplet repeat length bias and variation in the human transcriptome

AU - Molla, Michael

AU - Delcher, Arthur

AU - Sunyaev, Shamil

AU - Cantor, Charles

AU - Kasif, Simon

PY - 2009/10/6

Y1 - 2009/10/6

N2 - Length variation in short tandem repeats (STRs) is an important family of DNA polymorphisms with numerous applications in genetics, medicine, forensics, and evolutionary analysis. Several major diseases have been associated with length variation of trinucleotide (triplet) repeats including Huntington's disease, hereditary ataxias and spinobulbar muscular atrophy. Using the reference human genome, we have catalogued all triplet repeats in genic regions. This data revealed a bias in noncoding DNA repeat lengths. It also enabled a survey of repeat-length polymorphisms (RLPs) in human genomes and a comparison of the rate of polymorphism in humans versus divergence from chimpanzee. For short repeats, this analysis of three human genomes reveals a relatively low RLP rate in exons and, somewhat surprisingly, in introns. All short RLPs observed in multiple genomes are biallelic (at least in this small sample). In contrast, long repeats are highly polymorphic and some long RLPs are multiallelic. For long repeats, the chimpanzee sequence frequently differs from all observed human alleles. This suggests a high expansion/contraction rate in all long repeats. Expansions and contractions are not, however, affected by natural selection discernable from our comparison of human-chimpanzee divergence with human RLPs. Our catalog of human triplet repeats and their surrounding flanking regions can be used to produce a cost-effective whole-genome assay to test individuals. This repeat assay could someday complement SNP arrays for producing tests that assess the risk of an individual to develop a disease, or become part of personalized genomic strategy that provides therapeutic guidance with respect to drug response.

AB - Length variation in short tandem repeats (STRs) is an important family of DNA polymorphisms with numerous applications in genetics, medicine, forensics, and evolutionary analysis. Several major diseases have been associated with length variation of trinucleotide (triplet) repeats including Huntington's disease, hereditary ataxias and spinobulbar muscular atrophy. Using the reference human genome, we have catalogued all triplet repeats in genic regions. This data revealed a bias in noncoding DNA repeat lengths. It also enabled a survey of repeat-length polymorphisms (RLPs) in human genomes and a comparison of the rate of polymorphism in humans versus divergence from chimpanzee. For short repeats, this analysis of three human genomes reveals a relatively low RLP rate in exons and, somewhat surprisingly, in introns. All short RLPs observed in multiple genomes are biallelic (at least in this small sample). In contrast, long repeats are highly polymorphic and some long RLPs are multiallelic. For long repeats, the chimpanzee sequence frequently differs from all observed human alleles. This suggests a high expansion/contraction rate in all long repeats. Expansions and contractions are not, however, affected by natural selection discernable from our comparison of human-chimpanzee divergence with human RLPs. Our catalog of human triplet repeats and their surrounding flanking regions can be used to produce a cost-effective whole-genome assay to test individuals. This repeat assay could someday complement SNP arrays for producing tests that assess the risk of an individual to develop a disease, or become part of personalized genomic strategy that provides therapeutic guidance with respect to drug response.

KW - Computational biology

KW - Genome

KW - Genomics

KW - Polymorphisms

KW - Tandem repeats

UR - http://www.scopus.com/inward/record.url?scp=70350125551&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70350125551&partnerID=8YFLogxK

U2 - 10.1073/pnas.0907112106

DO - 10.1073/pnas.0907112106

M3 - Article

C2 - 19805156

AN - SCOPUS:70350125551

VL - 106

SP - 17095

EP - 17100

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

SN - 0027-8424

IS - 40

ER -