Visualization and probability-based scoring of structural variants within repetitive sequences

Eitan Halper-Stromberg; Jared Steranka; Kathleen H. Burns; Sarven Sabunciyan; Rafael A. Irizarry

doi:10.1093/bioinformatics/btu054

Visualization and probability-based scoring of structural variants within repetitive sequences

Eitan Halper-Stromberg, Jared Steranka, Kathleen H. Burns, Sarven Sabunciyan, Rafael A. Irizarry

School of Medicine

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Motivation: Repetitive sequences account for approximately half of the human genome. Accurately ascertaining sequences in these regions with next generation sequencers is challenging, and requires a different set of analytical techniques than for reads originating from unique sequences. Complicating the matter are repetitive regions subject to programmed rearrangements, as is the case with the antigen-binding domains in the Immunoglobulin (Ig) and T-cell receptor (TCR) loci. Results: We developed a probability-based score and visualization method to aid in distinguishing true structural variants from alignment artifacts. We demonstrate the usefulness of this method in its ability to separate real structural variants from false positives generated with existing upstream analysis tools. We validated our approach using both target-capture and whole-genome experiments. Capture sequencing reads were generated from primary lymphoid tumors, cancer cell lines and an EBV-transformed lymphoblast cell line over the Ig and TCR loci. Whole-genome sequencing reads were from a lymphoblastoid cell-line.

Original language	English (US)
Pages (from-to)	1514-1521
Number of pages	8
Journal	Bioinformatics
Volume	30
Issue number	11
DOIs	https://doi.org/10.1093/bioinformatics/btu054
State	Published - Jun 1 2014

ASJC Scopus subject areas

Statistics and Probability
Biochemistry
Molecular Biology
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1093/bioinformatics/btu054

Cite this

@article{f2cc7eaa1cfc4bf592eeda2afc207d0b,

title = "Visualization and probability-based scoring of structural variants within repetitive sequences",

abstract = "Motivation: Repetitive sequences account for approximately half of the human genome. Accurately ascertaining sequences in these regions with next generation sequencers is challenging, and requires a different set of analytical techniques than for reads originating from unique sequences. Complicating the matter are repetitive regions subject to programmed rearrangements, as is the case with the antigen-binding domains in the Immunoglobulin (Ig) and T-cell receptor (TCR) loci. Results: We developed a probability-based score and visualization method to aid in distinguishing true structural variants from alignment artifacts. We demonstrate the usefulness of this method in its ability to separate real structural variants from false positives generated with existing upstream analysis tools. We validated our approach using both target-capture and whole-genome experiments. Capture sequencing reads were generated from primary lymphoid tumors, cancer cell lines and an EBV-transformed lymphoblast cell line over the Ig and TCR loci. Whole-genome sequencing reads were from a lymphoblastoid cell-line.",

author = "Eitan Halper-Stromberg and Jared Steranka and Burns, {Kathleen H.} and Sarven Sabunciyan and Irizarry, {Rafael A.}",

year = "2014",

month = jun,

day = "1",

doi = "10.1093/bioinformatics/btu054",

language = "English (US)",

volume = "30",

pages = "1514--1521",

journal = "Bioinformatics",

issn = "1367-4803",

publisher = "Oxford University Press",

number = "11",

}

TY - JOUR

T1 - Visualization and probability-based scoring of structural variants within repetitive sequences

AU - Halper-Stromberg, Eitan

AU - Steranka, Jared

AU - Burns, Kathleen H.

AU - Sabunciyan, Sarven

AU - Irizarry, Rafael A.

PY - 2014/6/1

Y1 - 2014/6/1

N2 - Motivation: Repetitive sequences account for approximately half of the human genome. Accurately ascertaining sequences in these regions with next generation sequencers is challenging, and requires a different set of analytical techniques than for reads originating from unique sequences. Complicating the matter are repetitive regions subject to programmed rearrangements, as is the case with the antigen-binding domains in the Immunoglobulin (Ig) and T-cell receptor (TCR) loci. Results: We developed a probability-based score and visualization method to aid in distinguishing true structural variants from alignment artifacts. We demonstrate the usefulness of this method in its ability to separate real structural variants from false positives generated with existing upstream analysis tools. We validated our approach using both target-capture and whole-genome experiments. Capture sequencing reads were generated from primary lymphoid tumors, cancer cell lines and an EBV-transformed lymphoblast cell line over the Ig and TCR loci. Whole-genome sequencing reads were from a lymphoblastoid cell-line.

AB - Motivation: Repetitive sequences account for approximately half of the human genome. Accurately ascertaining sequences in these regions with next generation sequencers is challenging, and requires a different set of analytical techniques than for reads originating from unique sequences. Complicating the matter are repetitive regions subject to programmed rearrangements, as is the case with the antigen-binding domains in the Immunoglobulin (Ig) and T-cell receptor (TCR) loci. Results: We developed a probability-based score and visualization method to aid in distinguishing true structural variants from alignment artifacts. We demonstrate the usefulness of this method in its ability to separate real structural variants from false positives generated with existing upstream analysis tools. We validated our approach using both target-capture and whole-genome experiments. Capture sequencing reads were generated from primary lymphoid tumors, cancer cell lines and an EBV-transformed lymphoblast cell line over the Ig and TCR loci. Whole-genome sequencing reads were from a lymphoblastoid cell-line.

UR - http://www.scopus.com/inward/record.url?scp=84901308393&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84901308393&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btu054

DO - 10.1093/bioinformatics/btu054

M3 - Article

C2 - 24501098

AN - SCOPUS:84901308393

SN - 1367-4803

VL - 30

SP - 1514

EP - 1521

JO - Bioinformatics

JF - Bioinformatics

IS - 11

ER -

Visualization and probability-based scoring of structural variants within repetitive sequences

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this