TY - JOUR
T1 - A novel canis lupus familiaris reference genome improves variant resolution for use in breed-specific GWAS
AU - Player, Robert A.
AU - Forsyth, Ellen R.
AU - Verratti, Kathleen J.
AU - Mohr, David W.
AU - Scott, Alan F.
AU - Bradburne, Christopher E.
N1 - Funding Information:
Karen L Meidenbauer, (DVM) Doctor of Veterinary Medicine, is acknowledged for her technical leadership and expertise, as well as her participation in drawing blood, which was transported via shippable pelican case with all lab equipment, reagents, and samples. David M Deglau and Michael A House are gratefully acknowledged for project and program management, respectively. Jody BG Proescher is acknowledged for her critical review of and editorial feedback for the manuscript. Funding for this project was provided by the Department of Homeland Security Science and Technology (S&T) Directorate, Contract No. 70RSAT19CB0000002.
Publisher Copyright:
© 2021 Rockefeller University Press. All rights reserved.
PY - 2021/4
Y1 - 2021/4
N2 - Reference genome fidelity is critically important for genome wide association studies, yet most vary widely from the study population. A typical whole genome sequencing approach implies short-read technologies resulting in fragmented assemblies with regions of ambiguity. Further information is lost by economic necessity when genotyping populations, as lower resolution technologies such as genotyping arrays are commonly used. Here, we present a phased reference genome for Canis lupus familiaris using high molecular weight DNA-sequencing technologies. We tested wet laboratory and bioinformatic approaches to demonstrate a minimum workflow to generate the 2.4 gigabase genome for a Labrador Retriever. The de novo assembly required eight Oxford Nanopore R9.4 flow-cells (~23X depth) and running a 10X Genomics library on the equivalent of one lane of an Illumina NovaSeq S1 flowcell (~88X depth), bringing the cost of generating a nearly complete reference genome to less than $10K (USD). Mapping of short-read data from 10 Labrador Retrievers against this reference resulted in 1% more aligned reads versus the current reference (Can-Fam3.1, P < 0.001), and a 15% reduction of variant calls, increasing the chance of identifying true, low-effect size variants in a genome-wide association studies. We believe that by incorporating the cost to produce a full genome assembly into any large-scale genotyping project, an investigator can improve study power, decrease costs, and optimize the overall scientific value of their study.
AB - Reference genome fidelity is critically important for genome wide association studies, yet most vary widely from the study population. A typical whole genome sequencing approach implies short-read technologies resulting in fragmented assemblies with regions of ambiguity. Further information is lost by economic necessity when genotyping populations, as lower resolution technologies such as genotyping arrays are commonly used. Here, we present a phased reference genome for Canis lupus familiaris using high molecular weight DNA-sequencing technologies. We tested wet laboratory and bioinformatic approaches to demonstrate a minimum workflow to generate the 2.4 gigabase genome for a Labrador Retriever. The de novo assembly required eight Oxford Nanopore R9.4 flow-cells (~23X depth) and running a 10X Genomics library on the equivalent of one lane of an Illumina NovaSeq S1 flowcell (~88X depth), bringing the cost of generating a nearly complete reference genome to less than $10K (USD). Mapping of short-read data from 10 Labrador Retrievers against this reference resulted in 1% more aligned reads versus the current reference (Can-Fam3.1, P < 0.001), and a 15% reduction of variant calls, increasing the chance of identifying true, low-effect size variants in a genome-wide association studies. We believe that by incorporating the cost to produce a full genome assembly into any large-scale genotyping project, an investigator can improve study power, decrease costs, and optimize the overall scientific value of their study.
UR - http://www.scopus.com/inward/record.url?scp=85100961352&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85100961352&partnerID=8YFLogxK
U2 - 10.26508/LSA.202000902
DO - 10.26508/LSA.202000902
M3 - Article
C2 - 33514656
AN - SCOPUS:85100961352
SN - 2575-1077
VL - 4
JO - Life science alliance
JF - Life science alliance
IS - 4
M1 - e202000902
ER -