TY - JOUR
T1 - Chromosome-scale assembly of the bread wheat genome reveals thousands of additional gene copies
AU - Alonge, Michael
AU - Shumate, Alaina
AU - Puiu, Daniela
AU - Zimin, Aleksey V.
AU - Salzberg, Steven L.
N1 - Funding Information:
This work was supported in part by the National Institutes of Health (NIH) under grants R01-HG006677 and R35-GM130151, and by the United States Department of Agriculture (USDA) National Institute of Food and Agriculture under grant 2018-67015-28199.
Publisher Copyright:
© 2020 Genetics. All rights reserved.
PY - 2020/10
Y1 - 2020/10
N2 - Bread wheat (Triticum aestivum) is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC CS v1.0, Chinese Spring) that has been widely adopted and utilized by the wheat community. Although this reference assembly represents all three wheat subgenomes at chromosome- scale, it was derived from short reads, and thus is missing a substantial portion of the expected 16 Gbp of genomic sequence. We earlier published an independent wheat assembly (Triticum_aestivum_3.1, Chinese Spring) that came much closer in length to the expected genome size, although it was only a contig-level assembly lacking gene annotations. Here, we describe a reference-guided effort to scaffold those contigs into chromosome-length pseudomolecules, add in any missing sequence that was unique to the IWGSC CS v1.0 assembly, and annotate the resulting pseudomolecules with genes. Our updated assembly, Triticum_aestivum_4.0, contains 15.07 Gbp of nongap sequence anchored to chromosomes, which is 1.2 Gbps more than the previous reference assembly. It includes 108,639 genes unambiguously localized to chromosomes, including over 2000 genes that were previously unplaced. We also discovered >5700 additional gene copies, facilitating the accurate annotation of functional gene duplications including at the Ppd- B1 photoperiod response locus.
AB - Bread wheat (Triticum aestivum) is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC CS v1.0, Chinese Spring) that has been widely adopted and utilized by the wheat community. Although this reference assembly represents all three wheat subgenomes at chromosome- scale, it was derived from short reads, and thus is missing a substantial portion of the expected 16 Gbp of genomic sequence. We earlier published an independent wheat assembly (Triticum_aestivum_3.1, Chinese Spring) that came much closer in length to the expected genome size, although it was only a contig-level assembly lacking gene annotations. Here, we describe a reference-guided effort to scaffold those contigs into chromosome-length pseudomolecules, add in any missing sequence that was unique to the IWGSC CS v1.0 assembly, and annotate the resulting pseudomolecules with genes. Our updated assembly, Triticum_aestivum_4.0, contains 15.07 Gbp of nongap sequence anchored to chromosomes, which is 1.2 Gbps more than the previous reference assembly. It includes 108,639 genes unambiguously localized to chromosomes, including over 2000 genes that were previously unplaced. We also discovered >5700 additional gene copies, facilitating the accurate annotation of functional gene duplications including at the Ppd- B1 photoperiod response locus.
KW - Gene annotation
KW - Gene duplication
KW - Genome assembly
KW - Scaffolding
KW - Wheat
UR - http://www.scopus.com/inward/record.url?scp=85092318278&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092318278&partnerID=8YFLogxK
U2 - 10.1534/genetics.120.303501
DO - 10.1534/genetics.120.303501
M3 - Article
C2 - 32796007
AN - SCOPUS:85092318278
SN - 0016-6731
VL - 216
SP - 599
EP - 608
JO - Genetics
JF - Genetics
IS - 2
ER -