TY - JOUR
T1 - A phased Canis lupus familiaris Labrador Retriever reference genome utilizing high molecular weight DNA extraction methods and high resolution sequencing technologies
AU - Player, Robert A.
AU - Forsyth, Ellen R.
AU - Verratti, Kathleen J.
AU - Mohr, David W.
AU - Scott, Alan F.
AU - Bradburne, Christopher E.
N1 - Publisher Copyright:
The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020/8/27
Y1 - 2020/8/27
N2 - Reference genome fidelity is critically important for genome wide association studies (GWAS), yet many are incomplete or too dissimilar from the study population. A typical whole genome sequencing approach implies short-read technologies resulting in fragmented assemblies with regions of ambiguity low complexity. Further information is lost by economic necessity when genotyping populations, as lower resolution technologies such as genotyping arrays are commonly utilized. Here we present a phased reference genome for Canis lupus familiaris utilizing high molecular weight sequencing technologies. We tested wet lab and bioinformatic approaches to demonstrate a minimum workflow to generate the 2.4 gigabase genome for a Labrador Retriever. The resulting de novo assembly required eight Oxford Nanopore R9.4 flowcells (~23X depth) and running a 10X Genomics library on the equivalent of one lane of an Illumina NovaSeq S1 flowcell (~88X depth), bringing the cost of generating a nearly complete reference genome to less than $10K. Mapping of publicly available short-read data from ten Labrador Retrievers against this breed-specific reference resulted in an average of approximately 1% more aligned reads compared to mapping against the current gold standard reference (CanFam3.1, p<0.001), indicating a more complete breed-specific reference. An average 15% reduction of variant calls was observed from the same mapped data, which increases the chance of identifying low effect size variants in a GWAS. We believe that by incorporating the cost to produce a full genome assembly into any large-scale canine genotyping study, an investigator can make an informed cost/benefit analysis regarding genotyping technology.
AB - Reference genome fidelity is critically important for genome wide association studies (GWAS), yet many are incomplete or too dissimilar from the study population. A typical whole genome sequencing approach implies short-read technologies resulting in fragmented assemblies with regions of ambiguity low complexity. Further information is lost by economic necessity when genotyping populations, as lower resolution technologies such as genotyping arrays are commonly utilized. Here we present a phased reference genome for Canis lupus familiaris utilizing high molecular weight sequencing technologies. We tested wet lab and bioinformatic approaches to demonstrate a minimum workflow to generate the 2.4 gigabase genome for a Labrador Retriever. The resulting de novo assembly required eight Oxford Nanopore R9.4 flowcells (~23X depth) and running a 10X Genomics library on the equivalent of one lane of an Illumina NovaSeq S1 flowcell (~88X depth), bringing the cost of generating a nearly complete reference genome to less than $10K. Mapping of publicly available short-read data from ten Labrador Retrievers against this breed-specific reference resulted in an average of approximately 1% more aligned reads compared to mapping against the current gold standard reference (CanFam3.1, p<0.001), indicating a more complete breed-specific reference. An average 15% reduction of variant calls was observed from the same mapped data, which increases the chance of identifying low effect size variants in a GWAS. We believe that by incorporating the cost to produce a full genome assembly into any large-scale canine genotyping study, an investigator can make an informed cost/benefit analysis regarding genotyping technology.
KW - Canis lupus familiaris
KW - De novo assembly
KW - Labrador Retriever
KW - Phased genome assembly
UR - http://www.scopus.com/inward/record.url?scp=85098908084&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098908084&partnerID=8YFLogxK
U2 - 10.1101/2020.08.26.269076
DO - 10.1101/2020.08.26.269076
M3 - Article
AN - SCOPUS:85098908084
JO - Advances in Water Resources
JF - Advances in Water Resources
SN - 0309-1708
ER -