IVT-seq reveals extreme bias in RNA sequencing

Nicholas F. Lahens; Ibrahim Halil Kavakli; Ray Zhang; Katharina Hayer; Michael B. Black; Hannah Dueck; Angel Pizarro; Junhyong Kim; Rafael Irizarry; Russell S. Thomas; Gregory R. Grant; John B. Hogenesch

doi:10.1186/gb-2014-15-6-r86

IVT-seq reveals extreme bias in RNA sequencing

Nicholas F. Lahens, Ibrahim Halil Kavakli, Ray Zhang, Katharina Hayer, Michael B. Black, Hannah Dueck, Angel Pizarro, Junhyong Kim, Rafael Irizarry, Russell S. Thomas, Gregory R. Grant, John B. Hogenesch

Research output: Contribution to journal › Article › peer-review

91 Scopus citations

Abstract

Background: RNA-seq is a powerful technique for identifying and quantifying transcription and splicing events, both known and novel. However, given its recent development and the proliferation of library construction methods, understanding the bias it introduces is incomplete but critical to realizing its value. Results: We present a method, in vitro transcription sequencing (IVT-seq), for identifying and assessing the technical biases in RNA-seq library generation and sequencing at scale. We created a pool of over 1,000 in vitro transcribed RNAs from a full-length human cDNA library and sequenced them with polyA and total RNA-seq, the most common protocols. Because each cDNA is full length, and we show in vitro transcription is incredibly processive, each base in each transcript should be equivalently represented. However, with common RNA-seq applications and platforms, we find 50% of transcripts have more than two-fold and 10% have more than 10-fold differences in within-transcript sequence coverage. We also find greater than 6% of transcripts have regions of dramatically unpredictable sequencing coverage between samples, confounding accurate determination of their expression. We use a combination of experimental and computational approaches to show rRNA depletion is responsible for the most significant variability in coverage, and several sequence determinants also strongly influence representation. Conclusions: These results show the utility of IVT-seq for promoting better understanding of bias introduced by RNA-seq. We find rRNA depletion is responsible for substantial, unappreciated biases in coverage introduced during library preparation. These biases suggest exon-level expression analysis may be inadvisable, and we recommend caution when interpreting RNA-seq results.

Original language	English (US)
Article number	R86
Journal	Genome biology
Volume	15
Issue number	6
DOIs	https://doi.org/10.1186/gb-2014-15-6-r86
State	Published - Jun 30 2014
Externally published	Yes

ASJC Scopus subject areas

Ecology, Evolution, Behavior and Systematics
Genetics
Cell Biology

Access to Document

10.1186/gb-2014-15-6-r86

Cite this

@article{92389d5f2ea9465cb4808a6362e4abc5,

title = "IVT-seq reveals extreme bias in RNA sequencing",

abstract = "Background: RNA-seq is a powerful technique for identifying and quantifying transcription and splicing events, both known and novel. However, given its recent development and the proliferation of library construction methods, understanding the bias it introduces is incomplete but critical to realizing its value. Results: We present a method, in vitro transcription sequencing (IVT-seq), for identifying and assessing the technical biases in RNA-seq library generation and sequencing at scale. We created a pool of over 1,000 in vitro transcribed RNAs from a full-length human cDNA library and sequenced them with polyA and total RNA-seq, the most common protocols. Because each cDNA is full length, and we show in vitro transcription is incredibly processive, each base in each transcript should be equivalently represented. However, with common RNA-seq applications and platforms, we find 50% of transcripts have more than two-fold and 10% have more than 10-fold differences in within-transcript sequence coverage. We also find greater than 6% of transcripts have regions of dramatically unpredictable sequencing coverage between samples, confounding accurate determination of their expression. We use a combination of experimental and computational approaches to show rRNA depletion is responsible for the most significant variability in coverage, and several sequence determinants also strongly influence representation. Conclusions: These results show the utility of IVT-seq for promoting better understanding of bias introduced by RNA-seq. We find rRNA depletion is responsible for substantial, unappreciated biases in coverage introduced during library preparation. These biases suggest exon-level expression analysis may be inadvisable, and we recommend caution when interpreting RNA-seq results.",

author = "Lahens, {Nicholas F.} and Kavakli, {Ibrahim Halil} and Ray Zhang and Katharina Hayer and Black, {Michael B.} and Hannah Dueck and Angel Pizarro and Junhyong Kim and Rafael Irizarry and Thomas, {Russell S.} and Grant, {Gregory R.} and Hogenesch, {John B.}",

note = "Funding Information: We would like to thank the Penn Genome Frontiers Institute sequencing core, the Institute for Diabetes, Obesity and Metabolism, the DRC grant (P30DK19525), and the services of the Functional Genomics Core for performing the Illumina sequencing. JBH is supported by the National Institutes of Health (NIH) grants 2-R01-NS054794-06 and 5-R01-HL097800-04 and by DARPA [12-DARPA-1068] (to John Harer, Duke University). GRG is supported by the National Center for Research Resources and the National Center for Advancing Translational Sciences, NIH, through Grant UL1TR000003. This project is funded, in part, by the Penn Genome Frontiers Institute under an HRFF grant with the Pennsylvania Department of Health, which disclaims responsibility for any analyses, interpretations, or conclusions. This project is also supported in part by the Institute for Translational Medicine and Therapeutics of the Perelman School of Medicine at the University of Pennsylvania. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. Publisher Copyright: {\textcopyright} 2014 Lahens et al.",

year = "2014",

month = jun,

day = "30",

doi = "10.1186/gb-2014-15-6-r86",

language = "English (US)",

volume = "15",

journal = "Genome biology",

issn = "1474-7596",

publisher = "BioMed Central",

number = "6",

}

TY - JOUR

T1 - IVT-seq reveals extreme bias in RNA sequencing

AU - Lahens, Nicholas F.

AU - Kavakli, Ibrahim Halil

AU - Zhang, Ray

AU - Hayer, Katharina

AU - Black, Michael B.

AU - Dueck, Hannah

AU - Pizarro, Angel

AU - Kim, Junhyong

AU - Irizarry, Rafael

AU - Thomas, Russell S.

AU - Grant, Gregory R.

AU - Hogenesch, John B.

N1 - Funding Information: We would like to thank the Penn Genome Frontiers Institute sequencing core, the Institute for Diabetes, Obesity and Metabolism, the DRC grant (P30DK19525), and the services of the Functional Genomics Core for performing the Illumina sequencing. JBH is supported by the National Institutes of Health (NIH) grants 2-R01-NS054794-06 and 5-R01-HL097800-04 and by DARPA [12-DARPA-1068] (to John Harer, Duke University). GRG is supported by the National Center for Research Resources and the National Center for Advancing Translational Sciences, NIH, through Grant UL1TR000003. This project is funded, in part, by the Penn Genome Frontiers Institute under an HRFF grant with the Pennsylvania Department of Health, which disclaims responsibility for any analyses, interpretations, or conclusions. This project is also supported in part by the Institute for Translational Medicine and Therapeutics of the Perelman School of Medicine at the University of Pennsylvania. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. Publisher Copyright: © 2014 Lahens et al.

PY - 2014/6/30

Y1 - 2014/6/30

N2 - Background: RNA-seq is a powerful technique for identifying and quantifying transcription and splicing events, both known and novel. However, given its recent development and the proliferation of library construction methods, understanding the bias it introduces is incomplete but critical to realizing its value. Results: We present a method, in vitro transcription sequencing (IVT-seq), for identifying and assessing the technical biases in RNA-seq library generation and sequencing at scale. We created a pool of over 1,000 in vitro transcribed RNAs from a full-length human cDNA library and sequenced them with polyA and total RNA-seq, the most common protocols. Because each cDNA is full length, and we show in vitro transcription is incredibly processive, each base in each transcript should be equivalently represented. However, with common RNA-seq applications and platforms, we find 50% of transcripts have more than two-fold and 10% have more than 10-fold differences in within-transcript sequence coverage. We also find greater than 6% of transcripts have regions of dramatically unpredictable sequencing coverage between samples, confounding accurate determination of their expression. We use a combination of experimental and computational approaches to show rRNA depletion is responsible for the most significant variability in coverage, and several sequence determinants also strongly influence representation. Conclusions: These results show the utility of IVT-seq for promoting better understanding of bias introduced by RNA-seq. We find rRNA depletion is responsible for substantial, unappreciated biases in coverage introduced during library preparation. These biases suggest exon-level expression analysis may be inadvisable, and we recommend caution when interpreting RNA-seq results.

AB - Background: RNA-seq is a powerful technique for identifying and quantifying transcription and splicing events, both known and novel. However, given its recent development and the proliferation of library construction methods, understanding the bias it introduces is incomplete but critical to realizing its value. Results: We present a method, in vitro transcription sequencing (IVT-seq), for identifying and assessing the technical biases in RNA-seq library generation and sequencing at scale. We created a pool of over 1,000 in vitro transcribed RNAs from a full-length human cDNA library and sequenced them with polyA and total RNA-seq, the most common protocols. Because each cDNA is full length, and we show in vitro transcription is incredibly processive, each base in each transcript should be equivalently represented. However, with common RNA-seq applications and platforms, we find 50% of transcripts have more than two-fold and 10% have more than 10-fold differences in within-transcript sequence coverage. We also find greater than 6% of transcripts have regions of dramatically unpredictable sequencing coverage between samples, confounding accurate determination of their expression. We use a combination of experimental and computational approaches to show rRNA depletion is responsible for the most significant variability in coverage, and several sequence determinants also strongly influence representation. Conclusions: These results show the utility of IVT-seq for promoting better understanding of bias introduced by RNA-seq. We find rRNA depletion is responsible for substantial, unappreciated biases in coverage introduced during library preparation. These biases suggest exon-level expression analysis may be inadvisable, and we recommend caution when interpreting RNA-seq results.

UR - http://www.scopus.com/inward/record.url?scp=84911861819&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84911861819&partnerID=8YFLogxK

U2 - 10.1186/gb-2014-15-6-r86

DO - 10.1186/gb-2014-15-6-r86

M3 - Article

C2 - 24981968

AN - SCOPUS:84911861819

SN - 1474-7596

VL - 15

JO - Genome biology

JF - Genome biology

IS - 6

M1 - R86

ER -

IVT-seq reveals extreme bias in RNA sequencing

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this