TY - JOUR
T1 - Analyzing whole genome bisulfite sequencing data from highly divergent genotypes
AU - Wulfridge, Phillip
AU - Langmead, Ben
AU - Feinberg, Andrew P.
AU - Hansen, Kasper D.
N1 - Publisher Copyright:
© 2019 Oxford University Press. All rights reserved.
PY - 2019/11/4
Y1 - 2019/11/4
N2 - In the study of DNA methylation, genetic variation between species, strains or individuals can result in CpG sites that are exclusive to a subset of samples, and insertions and deletions can rearrange the spatial distribution of CpGs. How to account for this variation in an analysis of the interplay between sequence variation and DNA methylation is not well understood, especially when the number of CpG differences between samples is large. Here, we use wholegenome bisulfite sequencing data on two highly divergent mouse strains to study this problem. We show that alignment to personal genomes is necessary for valid methylation quantification. We introduce a method for including strain-specific CpGs in differential analysis, and show that this increases power. We apply our method to a human normalcancer dataset, and show this improves accuracy and power, illustrating the broad applicability of our approach. Our method uses smoothing to impute methylation levels at strain-specific sites, thereby allowing strain-specific CpGs to contribute to the analysis, while accounting for differences in the spatial occurrences of CpGs. Our results have implications for joint analysis of genetic variation and DNA methylation using bisulfite-converted DNA, and unlocks the use of personal genomes for addressing this question.
AB - In the study of DNA methylation, genetic variation between species, strains or individuals can result in CpG sites that are exclusive to a subset of samples, and insertions and deletions can rearrange the spatial distribution of CpGs. How to account for this variation in an analysis of the interplay between sequence variation and DNA methylation is not well understood, especially when the number of CpG differences between samples is large. Here, we use wholegenome bisulfite sequencing data on two highly divergent mouse strains to study this problem. We show that alignment to personal genomes is necessary for valid methylation quantification. We introduce a method for including strain-specific CpGs in differential analysis, and show that this increases power. We apply our method to a human normalcancer dataset, and show this improves accuracy and power, illustrating the broad applicability of our approach. Our method uses smoothing to impute methylation levels at strain-specific sites, thereby allowing strain-specific CpGs to contribute to the analysis, while accounting for differences in the spatial occurrences of CpGs. Our results have implications for joint analysis of genetic variation and DNA methylation using bisulfite-converted DNA, and unlocks the use of personal genomes for addressing this question.
UR - http://www.scopus.com/inward/record.url?scp=85074308717&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074308717&partnerID=8YFLogxK
U2 - 10.1093/nar/gkz674
DO - 10.1093/nar/gkz674
M3 - Article
C2 - 31392989
AN - SCOPUS:85074308717
SN - 0305-1048
VL - 47
SP - E117
JO - Nucleic acids research
JF - Nucleic acids research
IS - 19
ER -