TY - JOUR
T1 - A reference-quality, fully annotated genome from a Puerto Rican individual
AU - Zimin, Aleksey V.
AU - Shumate, Alaina
AU - Shinder, Ida
AU - Heinz, Jakob
AU - Puiu, Daniela
AU - Pertea, Mihaela
AU - Salzberg, Steven L.
N1 - Publisher Copyright:
© The Author(s) 2021. Published by Oxford University Press on behalf of Genetics Society of America. All rights reserved.
PY - 2022/2
Y1 - 2022/2
N2 - Until 2019, the human genome was available in only one fully annotated version, GRCh38, which was the result of 18 years of continuous improvement and revision. Despite dramatic improvements in sequencing technology, no other genome was available as an annotated reference until 2019, when the genome of an Ashkenazi individual, Ash1, was released. In this study, we describe the assembly and annotation of a second individual genome, from a Puerto Rican individual whose DNA was collected as part of the Human Pangenome project. The new genome, called PR1, is the first true reference genome created from an individual of African descent. Due to recent improvements in both sequencing and assembly technology, and particularly to the use of the recently completed CHM13 human genome as a guide to assembly, PR1 is more complete and more contiguous than either GRCh38 or Ash1. Annotation revealed 37,755 genes (of which 19,999 are protein coding), including 12 additional gene copies that are present in PR1 and missing from CHM13. Fifty-seven genes have fewer copies in PR1 than in CHM13, 9 map only partially, and 3 genes (all noncoding) from CHM13 are entirely missing from PR1.
AB - Until 2019, the human genome was available in only one fully annotated version, GRCh38, which was the result of 18 years of continuous improvement and revision. Despite dramatic improvements in sequencing technology, no other genome was available as an annotated reference until 2019, when the genome of an Ashkenazi individual, Ash1, was released. In this study, we describe the assembly and annotation of a second individual genome, from a Puerto Rican individual whose DNA was collected as part of the Human Pangenome project. The new genome, called PR1, is the first true reference genome created from an individual of African descent. Due to recent improvements in both sequencing and assembly technology, and particularly to the use of the recently completed CHM13 human genome as a guide to assembly, PR1 is more complete and more contiguous than either GRCh38 or Ash1. Annotation revealed 37,755 genes (of which 19,999 are protein coding), including 12 additional gene copies that are present in PR1 and missing from CHM13. Fifty-seven genes have fewer copies in PR1 than in CHM13, 9 map only partially, and 3 genes (all noncoding) from CHM13 are entirely missing from PR1.
KW - DNA sequencing
KW - annotation
KW - genome assembly
KW - reference genome
KW - variant calling
UR - http://www.scopus.com/inward/record.url?scp=85124437656&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85124437656&partnerID=8YFLogxK
U2 - 10.1093/genetics/iyab227
DO - 10.1093/genetics/iyab227
M3 - Article
C2 - 34897437
AN - SCOPUS:85124437656
SN - 0016-6731
VL - 220
JO - Genetics
JF - Genetics
IS - 2
M1 - iyab227
ER -