Data descriptor: Whole genome sequencing data for two individuals of Pakistani descent

Shahid Khan, Firoz Kabir, Oussama M’Hamdi, Xiaodong Jiao, Muhammad Asif Naeem, Shaheen N. Khan, Sheikh Riazuddin, J. Fielding Hejtmancik, Sheikh Amer Riazuddin

Research output: Contribution to journalArticle

Abstract

Here we report next-generation based whole genome sequencing of two individuals (H1 and H2) from a family of Pakistani descent. The genomic DNA was used to prepare paired-end libraries for whole-genome sequencing. Deep sequencing yielded 706.49 and 778.12 million mapped reads corresponding to 70.64 and 77.81 Gb sequence data and 23 × and 25 × average coverage for H1 and H2, respectively. Notably, a total of 448,544 and 470,683 novel variants, not present in the single nucleotide polymorphism database (dbSNP), were identified in H1 and H2, respectively. Comparative analysis identified 2,415,852 variants common in both genomes including 240,181 variants absent in the dbSNP. Principal component analysis linked the ancestry of both genomes with South Asian populations. In conclusion, we report whole genome sequences of two individuals from a family of Pakistani descent.

Original languageEnglish (US)
Article number180174
JournalScientific data
Volume5
DOIs
StatePublished - Sep 11 2018

Fingerprint

Descent
Sequencing
Descriptors
Genome
Genes
coverage
Single nucleotide Polymorphism
Nucleotides
Polymorphism
Comparative Analysis
Principal component analysis
Principal Component Analysis
Genomics
Coverage
DNA
Family

ASJC Scopus subject areas

  • Statistics and Probability
  • Information Systems
  • Education
  • Computer Science Applications
  • Statistics, Probability and Uncertainty
  • Library and Information Sciences

Cite this

Data descriptor : Whole genome sequencing data for two individuals of Pakistani descent. / Khan, Shahid; Kabir, Firoz; M’Hamdi, Oussama; Jiao, Xiaodong; Naeem, Muhammad Asif; Khan, Shaheen N.; Riazuddin, Sheikh; Hejtmancik, J. Fielding; Riazuddin, Sheikh Amer.

In: Scientific data, Vol. 5, 180174, 11.09.2018.

Research output: Contribution to journalArticle

Khan, S, Kabir, F, M’Hamdi, O, Jiao, X, Naeem, MA, Khan, SN, Riazuddin, S, Hejtmancik, JF & Riazuddin, SA 2018, 'Data descriptor: Whole genome sequencing data for two individuals of Pakistani descent', Scientific data, vol. 5, 180174. https://doi.org/10.1038/sdata.2018.174
Khan, Shahid ; Kabir, Firoz ; M’Hamdi, Oussama ; Jiao, Xiaodong ; Naeem, Muhammad Asif ; Khan, Shaheen N. ; Riazuddin, Sheikh ; Hejtmancik, J. Fielding ; Riazuddin, Sheikh Amer. / Data descriptor : Whole genome sequencing data for two individuals of Pakistani descent. In: Scientific data. 2018 ; Vol. 5.
@article{5d26af3ada2f42e4846fe764b94cdf82,
title = "Data descriptor: Whole genome sequencing data for two individuals of Pakistani descent",
abstract = "Here we report next-generation based whole genome sequencing of two individuals (H1 and H2) from a family of Pakistani descent. The genomic DNA was used to prepare paired-end libraries for whole-genome sequencing. Deep sequencing yielded 706.49 and 778.12 million mapped reads corresponding to 70.64 and 77.81 Gb sequence data and 23 × and 25 × average coverage for H1 and H2, respectively. Notably, a total of 448,544 and 470,683 novel variants, not present in the single nucleotide polymorphism database (dbSNP), were identified in H1 and H2, respectively. Comparative analysis identified 2,415,852 variants common in both genomes including 240,181 variants absent in the dbSNP. Principal component analysis linked the ancestry of both genomes with South Asian populations. In conclusion, we report whole genome sequences of two individuals from a family of Pakistani descent.",
author = "Shahid Khan and Firoz Kabir and Oussama M’Hamdi and Xiaodong Jiao and Naeem, {Muhammad Asif} and Khan, {Shaheen N.} and Sheikh Riazuddin and Hejtmancik, {J. Fielding} and Riazuddin, {Sheikh Amer}",
year = "2018",
month = "9",
day = "11",
doi = "10.1038/sdata.2018.174",
language = "English (US)",
volume = "5",
journal = "Scientific data",
issn = "2052-4463",
publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - Data descriptor

T2 - Whole genome sequencing data for two individuals of Pakistani descent

AU - Khan, Shahid

AU - Kabir, Firoz

AU - M’Hamdi, Oussama

AU - Jiao, Xiaodong

AU - Naeem, Muhammad Asif

AU - Khan, Shaheen N.

AU - Riazuddin, Sheikh

AU - Hejtmancik, J. Fielding

AU - Riazuddin, Sheikh Amer

PY - 2018/9/11

Y1 - 2018/9/11

N2 - Here we report next-generation based whole genome sequencing of two individuals (H1 and H2) from a family of Pakistani descent. The genomic DNA was used to prepare paired-end libraries for whole-genome sequencing. Deep sequencing yielded 706.49 and 778.12 million mapped reads corresponding to 70.64 and 77.81 Gb sequence data and 23 × and 25 × average coverage for H1 and H2, respectively. Notably, a total of 448,544 and 470,683 novel variants, not present in the single nucleotide polymorphism database (dbSNP), were identified in H1 and H2, respectively. Comparative analysis identified 2,415,852 variants common in both genomes including 240,181 variants absent in the dbSNP. Principal component analysis linked the ancestry of both genomes with South Asian populations. In conclusion, we report whole genome sequences of two individuals from a family of Pakistani descent.

AB - Here we report next-generation based whole genome sequencing of two individuals (H1 and H2) from a family of Pakistani descent. The genomic DNA was used to prepare paired-end libraries for whole-genome sequencing. Deep sequencing yielded 706.49 and 778.12 million mapped reads corresponding to 70.64 and 77.81 Gb sequence data and 23 × and 25 × average coverage for H1 and H2, respectively. Notably, a total of 448,544 and 470,683 novel variants, not present in the single nucleotide polymorphism database (dbSNP), were identified in H1 and H2, respectively. Comparative analysis identified 2,415,852 variants common in both genomes including 240,181 variants absent in the dbSNP. Principal component analysis linked the ancestry of both genomes with South Asian populations. In conclusion, we report whole genome sequences of two individuals from a family of Pakistani descent.

UR - http://www.scopus.com/inward/record.url?scp=85053309936&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053309936&partnerID=8YFLogxK

U2 - 10.1038/sdata.2018.174

DO - 10.1038/sdata.2018.174

M3 - Article

C2 - 30204152

AN - SCOPUS:85053309936

VL - 5

JO - Scientific data

JF - Scientific data

SN - 2052-4463

M1 - 180174

ER -