TY - JOUR
T1 - First draft assembly and annotation of the genome of a California endemic oak Quercus lobata Née (Fagaceae)
AU - Sork, Victoria L.
AU - Fitz-Gibbon, Sorel T.
AU - Puiu, Daniela
AU - Crepeau, Marc
AU - Gugger, Paul F.
AU - Sherman, Rachel
AU - Stevens, Kristian
AU - Langley, Charles H.
AU - Pellegrini, Matteo
AU - Salzberg, Steven L.
N1 - Funding Information:
We thank Andy Lentz for fieldwork and Krista Beckley for lab work. We thank the BUSCO authors for early access to their plant database. For sequencing, we acknowledge the support of Vincent J. Coates, Director of the Genomics Sequencing Laboratory at The California Institute for Quantitative Biosciences, Berkeley. We acknowledge University of California, Santa Barbara and University of California Nature Reserve System for access to the Sedgwick Reserve. Gene Bioinformatic Analyses were conducted through Hoffman2 Cluster at the UCLA. This work was supported by seed funding from UCLA to V.L.S. and in part by National Science Foundation grant IOS-1444611 to V.L.S., M.P., P.F.G., and S.L.S., and by National Institutes of Health grant R01-HG006677 to S.L.S.
Publisher Copyright:
© 2016 Sork et al.
PY - 2016
Y1 - 2016
N2 - Oak represents a valuable natural resource across Northern Hemisphere ecosystems, attracting a large research community studying its genetics, ecology, conservation, and management. Here we introduce a draft genome assembly of valley oak (Quercus lobata) using Illumina sequencing of adult leaf tissue of a tree found in an accessible, well-studied, natural southern California population. Our assembly includes a nuclear genome and a complete chloroplast genome, along with annotation of encoded genes. The assembly contains 94,394 scaffolds, totaling 1.17 Gb with 18,512 scaffolds of length 2 kb or longer, with a total length of 1.15 Gb, and a N50 scaffold size of 278,077 kb. The k-mer histograms indicate an diploid genome size of ~720-730 Mb, which is smaller than the total length due to high heterozygosity, estimated at 1.25%. A comparison with a recently published European oak (Q. robur) nuclear sequence indicates 93% similarity. The Q. lobata chloroplast genome has 99% identity with another North American oak, Q. rubra. Preliminary annotation yielded an estimate of 61,773 predicted protein-coding genes, of which 71% had similarity to known protein domains. We searched 956 Benchmarking Universal Single-Copy Orthologs, and found 863 complete orthologs, of which 450 were present in > 1 copy. We also examined an earlier version (v0.5) where duplicate haplotypes were removed to discover variants. These additional sources indicate that the predicted gene count in Version 1.0 is overestimated by 37-52%. Nonetheless, this first draft valley oak genome assembly represents a high-quality, well-annotated genome that provides a tool for forest restoration and management practices.
AB - Oak represents a valuable natural resource across Northern Hemisphere ecosystems, attracting a large research community studying its genetics, ecology, conservation, and management. Here we introduce a draft genome assembly of valley oak (Quercus lobata) using Illumina sequencing of adult leaf tissue of a tree found in an accessible, well-studied, natural southern California population. Our assembly includes a nuclear genome and a complete chloroplast genome, along with annotation of encoded genes. The assembly contains 94,394 scaffolds, totaling 1.17 Gb with 18,512 scaffolds of length 2 kb or longer, with a total length of 1.15 Gb, and a N50 scaffold size of 278,077 kb. The k-mer histograms indicate an diploid genome size of ~720-730 Mb, which is smaller than the total length due to high heterozygosity, estimated at 1.25%. A comparison with a recently published European oak (Q. robur) nuclear sequence indicates 93% similarity. The Q. lobata chloroplast genome has 99% identity with another North American oak, Q. rubra. Preliminary annotation yielded an estimate of 61,773 predicted protein-coding genes, of which 71% had similarity to known protein domains. We searched 956 Benchmarking Universal Single-Copy Orthologs, and found 863 complete orthologs, of which 450 were present in > 1 copy. We also examined an earlier version (v0.5) where duplicate haplotypes were removed to discover variants. These additional sources indicate that the predicted gene count in Version 1.0 is overestimated by 37-52%. Nonetheless, this first draft valley oak genome assembly represents a high-quality, well-annotated genome that provides a tool for forest restoration and management practices.
KW - Adaptation
KW - Annotation
KW - Chloroplast
KW - GenPred
KW - Genomic Selection
KW - Nuclear genome assembly
KW - Quercus
KW - Shared Data Resources
UR - http://www.scopus.com/inward/record.url?scp=84996524143&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84996524143&partnerID=8YFLogxK
U2 - 10.1534/g3.116.030411
DO - 10.1534/g3.116.030411
M3 - Article
AN - SCOPUS:84996524143
SN - 2160-1836
VL - 6
SP - 3485
EP - 3495
JO - G3: Genes, Genomes, Genetics
JF - G3: Genes, Genomes, Genetics
IS - 11
ER -