Visualization and probability-based scoring of structural variants within repetitive sequences

Eitan Halper-Stromberg, Jared Steranka, Kathleen Burns, Sarven Sabunciyan, Rafael A. Irizarry

Research output: Contribution to journalArticle

Abstract

Motivation: Repetitive sequences account for approximately half of the human genome. Accurately ascertaining sequences in these regions with next generation sequencers is challenging, and requires a different set of analytical techniques than for reads originating from unique sequences. Complicating the matter are repetitive regions subject to programmed rearrangements, as is the case with the antigen-binding domains in the Immunoglobulin (Ig) and T-cell receptor (TCR) loci. Results: We developed a probability-based score and visualization method to aid in distinguishing true structural variants from alignment artifacts. We demonstrate the usefulness of this method in its ability to separate real structural variants from false positives generated with existing upstream analysis tools. We validated our approach using both target-capture and whole-genome experiments. Capture sequencing reads were generated from primary lymphoid tumors, cancer cell lines and an EBV-transformed lymphoblast cell line over the Ig and TCR loci. Whole-genome sequencing reads were from a lymphoblastoid cell-line.

Original languageEnglish (US)
Pages (from-to)1514-1521
Number of pages8
JournalBioinformatics
Volume30
Issue number11
DOIs
StatePublished - Jun 1 2014

Fingerprint

Nucleic Acid Repetitive Sequences
T-Cell Antigen Receptor
Scoring
Immunoglobulin
Genome
T-cells
Visualization
Genes
Receptor
Transformed Cell Line
Sequencing
Locus
Immunoglobulins
Line
Cell
Human Genome
Tumor Cell Line
Human Herpesvirus 4
Cells
Artifacts

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

Visualization and probability-based scoring of structural variants within repetitive sequences. / Halper-Stromberg, Eitan; Steranka, Jared; Burns, Kathleen; Sabunciyan, Sarven; Irizarry, Rafael A.

In: Bioinformatics, Vol. 30, No. 11, 01.06.2014, p. 1514-1521.

Research output: Contribution to journalArticle

Halper-Stromberg, Eitan ; Steranka, Jared ; Burns, Kathleen ; Sabunciyan, Sarven ; Irizarry, Rafael A. / Visualization and probability-based scoring of structural variants within repetitive sequences. In: Bioinformatics. 2014 ; Vol. 30, No. 11. pp. 1514-1521.
@article{f2cc7eaa1cfc4bf592eeda2afc207d0b,
title = "Visualization and probability-based scoring of structural variants within repetitive sequences",
abstract = "Motivation: Repetitive sequences account for approximately half of the human genome. Accurately ascertaining sequences in these regions with next generation sequencers is challenging, and requires a different set of analytical techniques than for reads originating from unique sequences. Complicating the matter are repetitive regions subject to programmed rearrangements, as is the case with the antigen-binding domains in the Immunoglobulin (Ig) and T-cell receptor (TCR) loci. Results: We developed a probability-based score and visualization method to aid in distinguishing true structural variants from alignment artifacts. We demonstrate the usefulness of this method in its ability to separate real structural variants from false positives generated with existing upstream analysis tools. We validated our approach using both target-capture and whole-genome experiments. Capture sequencing reads were generated from primary lymphoid tumors, cancer cell lines and an EBV-transformed lymphoblast cell line over the Ig and TCR loci. Whole-genome sequencing reads were from a lymphoblastoid cell-line.",
author = "Eitan Halper-Stromberg and Jared Steranka and Kathleen Burns and Sarven Sabunciyan and Irizarry, {Rafael A.}",
year = "2014",
month = "6",
day = "1",
doi = "10.1093/bioinformatics/btu054",
language = "English (US)",
volume = "30",
pages = "1514--1521",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "11",

}

TY - JOUR

T1 - Visualization and probability-based scoring of structural variants within repetitive sequences

AU - Halper-Stromberg, Eitan

AU - Steranka, Jared

AU - Burns, Kathleen

AU - Sabunciyan, Sarven

AU - Irizarry, Rafael A.

PY - 2014/6/1

Y1 - 2014/6/1

N2 - Motivation: Repetitive sequences account for approximately half of the human genome. Accurately ascertaining sequences in these regions with next generation sequencers is challenging, and requires a different set of analytical techniques than for reads originating from unique sequences. Complicating the matter are repetitive regions subject to programmed rearrangements, as is the case with the antigen-binding domains in the Immunoglobulin (Ig) and T-cell receptor (TCR) loci. Results: We developed a probability-based score and visualization method to aid in distinguishing true structural variants from alignment artifacts. We demonstrate the usefulness of this method in its ability to separate real structural variants from false positives generated with existing upstream analysis tools. We validated our approach using both target-capture and whole-genome experiments. Capture sequencing reads were generated from primary lymphoid tumors, cancer cell lines and an EBV-transformed lymphoblast cell line over the Ig and TCR loci. Whole-genome sequencing reads were from a lymphoblastoid cell-line.

AB - Motivation: Repetitive sequences account for approximately half of the human genome. Accurately ascertaining sequences in these regions with next generation sequencers is challenging, and requires a different set of analytical techniques than for reads originating from unique sequences. Complicating the matter are repetitive regions subject to programmed rearrangements, as is the case with the antigen-binding domains in the Immunoglobulin (Ig) and T-cell receptor (TCR) loci. Results: We developed a probability-based score and visualization method to aid in distinguishing true structural variants from alignment artifacts. We demonstrate the usefulness of this method in its ability to separate real structural variants from false positives generated with existing upstream analysis tools. We validated our approach using both target-capture and whole-genome experiments. Capture sequencing reads were generated from primary lymphoid tumors, cancer cell lines and an EBV-transformed lymphoblast cell line over the Ig and TCR loci. Whole-genome sequencing reads were from a lymphoblastoid cell-line.

UR - http://www.scopus.com/inward/record.url?scp=84901308393&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84901308393&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btu054

DO - 10.1093/bioinformatics/btu054

M3 - Article

C2 - 24501098

AN - SCOPUS:84901308393

VL - 30

SP - 1514

EP - 1521

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 11

ER -