Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay

Dustin Shigaki, Orit Adato, Aashish N. Adhikari, Shengcheng Dong, Alex Hawkins-Hooker, Fumitaka Inoue, Tamar Juven-Gershon, Henry Kenlay, Beth Martin, Ayoti Patra, Dmitry D. Penzar, Max Schubach, Chenling Xiong, Zhongxia Yan, Alan P. Boyle, Anat Kreimer, Ivan V. Kulakovskiy, John Reid, Ron Unger, Nir YosefJay Shendure, Nadav Ahituv, Martin Kircher, Michael Beer

Research output: Contribution to journalArticle

Abstract

The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation.

Original languageEnglish (US)
Pages (from-to)1280-1291
Number of pages12
JournalHuman mutation
Volume40
Issue number9
DOIs
StatePublished - Sep 1 2019

Fingerprint

Epigenomics
Mutagenesis
Base Pairing
Libraries
Chromatin
Plasmids
Transcription Factors
Nucleotides
Binding Sites
Cell Line
Mutation
DNA
Machine Learning

Keywords

  • enhancers
  • gene regulation
  • machine learning
  • MPRA
  • promoters
  • regulatory variation

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay. / Shigaki, Dustin; Adato, Orit; Adhikari, Aashish N.; Dong, Shengcheng; Hawkins-Hooker, Alex; Inoue, Fumitaka; Juven-Gershon, Tamar; Kenlay, Henry; Martin, Beth; Patra, Ayoti; Penzar, Dmitry D.; Schubach, Max; Xiong, Chenling; Yan, Zhongxia; Boyle, Alan P.; Kreimer, Anat; Kulakovskiy, Ivan V.; Reid, John; Unger, Ron; Yosef, Nir; Shendure, Jay; Ahituv, Nadav; Kircher, Martin; Beer, Michael.

In: Human mutation, Vol. 40, No. 9, 01.09.2019, p. 1280-1291.

Research output: Contribution to journalArticle

Shigaki, D, Adato, O, Adhikari, AN, Dong, S, Hawkins-Hooker, A, Inoue, F, Juven-Gershon, T, Kenlay, H, Martin, B, Patra, A, Penzar, DD, Schubach, M, Xiong, C, Yan, Z, Boyle, AP, Kreimer, A, Kulakovskiy, IV, Reid, J, Unger, R, Yosef, N, Shendure, J, Ahituv, N, Kircher, M & Beer, M 2019, 'Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay', Human mutation, vol. 40, no. 9, pp. 1280-1291. https://doi.org/10.1002/humu.23797
Shigaki, Dustin ; Adato, Orit ; Adhikari, Aashish N. ; Dong, Shengcheng ; Hawkins-Hooker, Alex ; Inoue, Fumitaka ; Juven-Gershon, Tamar ; Kenlay, Henry ; Martin, Beth ; Patra, Ayoti ; Penzar, Dmitry D. ; Schubach, Max ; Xiong, Chenling ; Yan, Zhongxia ; Boyle, Alan P. ; Kreimer, Anat ; Kulakovskiy, Ivan V. ; Reid, John ; Unger, Ron ; Yosef, Nir ; Shendure, Jay ; Ahituv, Nadav ; Kircher, Martin ; Beer, Michael. / Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay. In: Human mutation. 2019 ; Vol. 40, No. 9. pp. 1280-1291.
@article{282371daa58841419d6c25d01525a945,
title = "Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay",
abstract = "The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation.",
keywords = "enhancers, gene regulation, machine learning, MPRA, promoters, regulatory variation",
author = "Dustin Shigaki and Orit Adato and Adhikari, {Aashish N.} and Shengcheng Dong and Alex Hawkins-Hooker and Fumitaka Inoue and Tamar Juven-Gershon and Henry Kenlay and Beth Martin and Ayoti Patra and Penzar, {Dmitry D.} and Max Schubach and Chenling Xiong and Zhongxia Yan and Boyle, {Alan P.} and Anat Kreimer and Kulakovskiy, {Ivan V.} and John Reid and Ron Unger and Nir Yosef and Jay Shendure and Nadav Ahituv and Martin Kircher and Michael Beer",
year = "2019",
month = "9",
day = "1",
doi = "10.1002/humu.23797",
language = "English (US)",
volume = "40",
pages = "1280--1291",
journal = "Human Mutation",
issn = "1059-7794",
publisher = "Wiley-Liss Inc.",
number = "9",

}

TY - JOUR

T1 - Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay

AU - Shigaki, Dustin

AU - Adato, Orit

AU - Adhikari, Aashish N.

AU - Dong, Shengcheng

AU - Hawkins-Hooker, Alex

AU - Inoue, Fumitaka

AU - Juven-Gershon, Tamar

AU - Kenlay, Henry

AU - Martin, Beth

AU - Patra, Ayoti

AU - Penzar, Dmitry D.

AU - Schubach, Max

AU - Xiong, Chenling

AU - Yan, Zhongxia

AU - Boyle, Alan P.

AU - Kreimer, Anat

AU - Kulakovskiy, Ivan V.

AU - Reid, John

AU - Unger, Ron

AU - Yosef, Nir

AU - Shendure, Jay

AU - Ahituv, Nadav

AU - Kircher, Martin

AU - Beer, Michael

PY - 2019/9/1

Y1 - 2019/9/1

N2 - The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation.

AB - The integrative analysis of high-throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease-associated human enhancers and nine disease-associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell-types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease-associated genetic variation.

KW - enhancers

KW - gene regulation

KW - machine learning

KW - MPRA

KW - promoters

KW - regulatory variation

UR - http://www.scopus.com/inward/record.url?scp=85071290740&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071290740&partnerID=8YFLogxK

U2 - 10.1002/humu.23797

DO - 10.1002/humu.23797

M3 - Article

C2 - 31106481

AN - SCOPUS:85071290740

VL - 40

SP - 1280

EP - 1291

JO - Human Mutation

JF - Human Mutation

SN - 1059-7794

IS - 9

ER -