Bioinformatics-Based Identification of Expanded Repeats: A Non-reference Intronic Pentamer Expansion in RFC1 Causes CANVAS

Haloom Rafehi, David J. Szmulewicz, Mark F. Bennett, Nara L.M. Sobreira, Kate Pope, Katherine R. Smith, Greta Gillies, Peter Diakumis, Egor Dolzhenko, Michael A. Eberle, María García Barcina, David P. Breen, Andrew M. Chancellor, Phillip D. Cremer, Martin B. Delatycki, Brent L. Fogel, Anna Hackett, G. Michael Halmagyi, Solange Kapetanovic, Anthony LangStuart Mossman, Weiyi Mu, Peter Patrikios, Susan L. Perlman, Ian Rosemergy, Elsdon Storey, Shaun R.D. Watson, Michael A. Wilson, David S. Zee, David Valle, David J. Amor, Melanie Bahlo, Paul J. Lockhart

Research output: Contribution to journalArticle

Abstract

Genomic technologies such as next-generation sequencing (NGS) are revolutionizing molecular diagnostics and clinical medicine. However, these approaches have proven inefficient at identifying pathogenic repeat expansions. Here, we apply a collection of bioinformatics tools that can be utilized to identify either known or novel expanded repeat sequences in NGS data. We performed genetic studies of a cohort of 35 individuals from 22 families with a clinical diagnosis of cerebellar ataxia with neuropathy and bilateral vestibular areflexia syndrome (CANVAS). Analysis of whole-genome sequence (WGS) data with five independent algorithms identified a recessively inherited intronic repeat expansion [(AAGGG)exp] in the gene encoding Replication Factor C1 (RFC1). This motif, not reported in the reference sequence, localized to an Alu element and replaced the reference (AAAAG)11 short tandem repeat. Genetic analyses confirmed the pathogenic expansion in 18 of 22 CANVAS-affected families and identified a core ancestral haplotype, estimated to have arisen in Europe more than twenty-five thousand years ago. WGS of the four RFC1-negative CANVAS-affected families identified plausible variants in three, with genomic re-diagnosis of SCA3, spastic ataxia of the Charlevoix-Saguenay type, and SCA45. This study identified the genetic basis of CANVAS and demonstrated that these improved bioinformatics tools increase the diagnostic utility of WGS to determine the genetic basis of a heterogeneous group of clinically overlapping neurogenetic disorders.

Original languageEnglish (US)
Pages (from-to)151-165
Number of pages15
JournalAmerican journal of human genetics
Volume105
Issue number1
DOIs
StatePublished - Jul 3 2019

Fingerprint

Cerebellar Ataxia
Computational Biology
Genome
Alu Elements
Molecular Medicine
Molecular Pathology
Clinical Medicine
Microsatellite Repeats
Haplotypes
Cohort Studies
Technology
Bilateral Vestibulopathy
Genes

Keywords

  • ataxia
  • CANVAS
  • repeat expansions
  • short tandem repeats
  • whole-genome sequencing

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

Bioinformatics-Based Identification of Expanded Repeats : A Non-reference Intronic Pentamer Expansion in RFC1 Causes CANVAS. / Rafehi, Haloom; Szmulewicz, David J.; Bennett, Mark F.; Sobreira, Nara L.M.; Pope, Kate; Smith, Katherine R.; Gillies, Greta; Diakumis, Peter; Dolzhenko, Egor; Eberle, Michael A.; Barcina, María García; Breen, David P.; Chancellor, Andrew M.; Cremer, Phillip D.; Delatycki, Martin B.; Fogel, Brent L.; Hackett, Anna; Halmagyi, G. Michael; Kapetanovic, Solange; Lang, Anthony; Mossman, Stuart; Mu, Weiyi; Patrikios, Peter; Perlman, Susan L.; Rosemergy, Ian; Storey, Elsdon; Watson, Shaun R.D.; Wilson, Michael A.; Zee, David S.; Valle, David; Amor, David J.; Bahlo, Melanie; Lockhart, Paul J.

In: American journal of human genetics, Vol. 105, No. 1, 03.07.2019, p. 151-165.

Research output: Contribution to journalArticle

Rafehi, H, Szmulewicz, DJ, Bennett, MF, Sobreira, NLM, Pope, K, Smith, KR, Gillies, G, Diakumis, P, Dolzhenko, E, Eberle, MA, Barcina, MG, Breen, DP, Chancellor, AM, Cremer, PD, Delatycki, MB, Fogel, BL, Hackett, A, Halmagyi, GM, Kapetanovic, S, Lang, A, Mossman, S, Mu, W, Patrikios, P, Perlman, SL, Rosemergy, I, Storey, E, Watson, SRD, Wilson, MA, Zee, DS, Valle, D, Amor, DJ, Bahlo, M & Lockhart, PJ 2019, 'Bioinformatics-Based Identification of Expanded Repeats: A Non-reference Intronic Pentamer Expansion in RFC1 Causes CANVAS', American journal of human genetics, vol. 105, no. 1, pp. 151-165. https://doi.org/10.1016/j.ajhg.2019.05.016
Rafehi, Haloom ; Szmulewicz, David J. ; Bennett, Mark F. ; Sobreira, Nara L.M. ; Pope, Kate ; Smith, Katherine R. ; Gillies, Greta ; Diakumis, Peter ; Dolzhenko, Egor ; Eberle, Michael A. ; Barcina, María García ; Breen, David P. ; Chancellor, Andrew M. ; Cremer, Phillip D. ; Delatycki, Martin B. ; Fogel, Brent L. ; Hackett, Anna ; Halmagyi, G. Michael ; Kapetanovic, Solange ; Lang, Anthony ; Mossman, Stuart ; Mu, Weiyi ; Patrikios, Peter ; Perlman, Susan L. ; Rosemergy, Ian ; Storey, Elsdon ; Watson, Shaun R.D. ; Wilson, Michael A. ; Zee, David S. ; Valle, David ; Amor, David J. ; Bahlo, Melanie ; Lockhart, Paul J. / Bioinformatics-Based Identification of Expanded Repeats : A Non-reference Intronic Pentamer Expansion in RFC1 Causes CANVAS. In: American journal of human genetics. 2019 ; Vol. 105, No. 1. pp. 151-165.
@article{c89a8639e54042d69cbbaeebe68111f4,
title = "Bioinformatics-Based Identification of Expanded Repeats: A Non-reference Intronic Pentamer Expansion in RFC1 Causes CANVAS",
abstract = "Genomic technologies such as next-generation sequencing (NGS) are revolutionizing molecular diagnostics and clinical medicine. However, these approaches have proven inefficient at identifying pathogenic repeat expansions. Here, we apply a collection of bioinformatics tools that can be utilized to identify either known or novel expanded repeat sequences in NGS data. We performed genetic studies of a cohort of 35 individuals from 22 families with a clinical diagnosis of cerebellar ataxia with neuropathy and bilateral vestibular areflexia syndrome (CANVAS). Analysis of whole-genome sequence (WGS) data with five independent algorithms identified a recessively inherited intronic repeat expansion [(AAGGG)exp] in the gene encoding Replication Factor C1 (RFC1). This motif, not reported in the reference sequence, localized to an Alu element and replaced the reference (AAAAG)11 short tandem repeat. Genetic analyses confirmed the pathogenic expansion in 18 of 22 CANVAS-affected families and identified a core ancestral haplotype, estimated to have arisen in Europe more than twenty-five thousand years ago. WGS of the four RFC1-negative CANVAS-affected families identified plausible variants in three, with genomic re-diagnosis of SCA3, spastic ataxia of the Charlevoix-Saguenay type, and SCA45. This study identified the genetic basis of CANVAS and demonstrated that these improved bioinformatics tools increase the diagnostic utility of WGS to determine the genetic basis of a heterogeneous group of clinically overlapping neurogenetic disorders.",
keywords = "ataxia, CANVAS, repeat expansions, short tandem repeats, whole-genome sequencing",
author = "Haloom Rafehi and Szmulewicz, {David J.} and Bennett, {Mark F.} and Sobreira, {Nara L.M.} and Kate Pope and Smith, {Katherine R.} and Greta Gillies and Peter Diakumis and Egor Dolzhenko and Eberle, {Michael A.} and Barcina, {Mar{\'i}a Garc{\'i}a} and Breen, {David P.} and Chancellor, {Andrew M.} and Cremer, {Phillip D.} and Delatycki, {Martin B.} and Fogel, {Brent L.} and Anna Hackett and Halmagyi, {G. Michael} and Solange Kapetanovic and Anthony Lang and Stuart Mossman and Weiyi Mu and Peter Patrikios and Perlman, {Susan L.} and Ian Rosemergy and Elsdon Storey and Watson, {Shaun R.D.} and Wilson, {Michael A.} and Zee, {David S.} and David Valle and Amor, {David J.} and Melanie Bahlo and Lockhart, {Paul J.}",
year = "2019",
month = "7",
day = "3",
doi = "10.1016/j.ajhg.2019.05.016",
language = "English (US)",
volume = "105",
pages = "151--165",
journal = "American Journal of Human Genetics",
issn = "0002-9297",
publisher = "Cell Press",
number = "1",

}

TY - JOUR

T1 - Bioinformatics-Based Identification of Expanded Repeats

T2 - A Non-reference Intronic Pentamer Expansion in RFC1 Causes CANVAS

AU - Rafehi, Haloom

AU - Szmulewicz, David J.

AU - Bennett, Mark F.

AU - Sobreira, Nara L.M.

AU - Pope, Kate

AU - Smith, Katherine R.

AU - Gillies, Greta

AU - Diakumis, Peter

AU - Dolzhenko, Egor

AU - Eberle, Michael A.

AU - Barcina, María García

AU - Breen, David P.

AU - Chancellor, Andrew M.

AU - Cremer, Phillip D.

AU - Delatycki, Martin B.

AU - Fogel, Brent L.

AU - Hackett, Anna

AU - Halmagyi, G. Michael

AU - Kapetanovic, Solange

AU - Lang, Anthony

AU - Mossman, Stuart

AU - Mu, Weiyi

AU - Patrikios, Peter

AU - Perlman, Susan L.

AU - Rosemergy, Ian

AU - Storey, Elsdon

AU - Watson, Shaun R.D.

AU - Wilson, Michael A.

AU - Zee, David S.

AU - Valle, David

AU - Amor, David J.

AU - Bahlo, Melanie

AU - Lockhart, Paul J.

PY - 2019/7/3

Y1 - 2019/7/3

N2 - Genomic technologies such as next-generation sequencing (NGS) are revolutionizing molecular diagnostics and clinical medicine. However, these approaches have proven inefficient at identifying pathogenic repeat expansions. Here, we apply a collection of bioinformatics tools that can be utilized to identify either known or novel expanded repeat sequences in NGS data. We performed genetic studies of a cohort of 35 individuals from 22 families with a clinical diagnosis of cerebellar ataxia with neuropathy and bilateral vestibular areflexia syndrome (CANVAS). Analysis of whole-genome sequence (WGS) data with five independent algorithms identified a recessively inherited intronic repeat expansion [(AAGGG)exp] in the gene encoding Replication Factor C1 (RFC1). This motif, not reported in the reference sequence, localized to an Alu element and replaced the reference (AAAAG)11 short tandem repeat. Genetic analyses confirmed the pathogenic expansion in 18 of 22 CANVAS-affected families and identified a core ancestral haplotype, estimated to have arisen in Europe more than twenty-five thousand years ago. WGS of the four RFC1-negative CANVAS-affected families identified plausible variants in three, with genomic re-diagnosis of SCA3, spastic ataxia of the Charlevoix-Saguenay type, and SCA45. This study identified the genetic basis of CANVAS and demonstrated that these improved bioinformatics tools increase the diagnostic utility of WGS to determine the genetic basis of a heterogeneous group of clinically overlapping neurogenetic disorders.

AB - Genomic technologies such as next-generation sequencing (NGS) are revolutionizing molecular diagnostics and clinical medicine. However, these approaches have proven inefficient at identifying pathogenic repeat expansions. Here, we apply a collection of bioinformatics tools that can be utilized to identify either known or novel expanded repeat sequences in NGS data. We performed genetic studies of a cohort of 35 individuals from 22 families with a clinical diagnosis of cerebellar ataxia with neuropathy and bilateral vestibular areflexia syndrome (CANVAS). Analysis of whole-genome sequence (WGS) data with five independent algorithms identified a recessively inherited intronic repeat expansion [(AAGGG)exp] in the gene encoding Replication Factor C1 (RFC1). This motif, not reported in the reference sequence, localized to an Alu element and replaced the reference (AAAAG)11 short tandem repeat. Genetic analyses confirmed the pathogenic expansion in 18 of 22 CANVAS-affected families and identified a core ancestral haplotype, estimated to have arisen in Europe more than twenty-five thousand years ago. WGS of the four RFC1-negative CANVAS-affected families identified plausible variants in three, with genomic re-diagnosis of SCA3, spastic ataxia of the Charlevoix-Saguenay type, and SCA45. This study identified the genetic basis of CANVAS and demonstrated that these improved bioinformatics tools increase the diagnostic utility of WGS to determine the genetic basis of a heterogeneous group of clinically overlapping neurogenetic disorders.

KW - ataxia

KW - CANVAS

KW - repeat expansions

KW - short tandem repeats

KW - whole-genome sequencing

UR - http://www.scopus.com/inward/record.url?scp=85068056700&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068056700&partnerID=8YFLogxK

U2 - 10.1016/j.ajhg.2019.05.016

DO - 10.1016/j.ajhg.2019.05.016

M3 - Article

C2 - 31230722

AN - SCOPUS:85068056700

VL - 105

SP - 151

EP - 165

JO - American Journal of Human Genetics

JF - American Journal of Human Genetics

SN - 0002-9297

IS - 1

ER -