TY - JOUR
T1 - Deep Learning in Protein Structural Modeling and Design
AU - Gao, Wenhao
AU - Mahajan, Sai Pooja
AU - Sulam, Jeremias
AU - Gray, Jeffrey J.
N1 - Funding Information:
This work was supported by the NIH through grant R01-GM078221. We thank Dr. Justin S. Smith at the Center for Nonlinear Studies at Los Alamos National Laboratory, NM, for helpful discussion and Dr. Andrew D. White at the Department of Chemical Engineering at University of Rochester, NY, and Alexander Rives at the Department of Computer Science at New York University, NY, for helpful suggestions. We are also grateful for insightful suggestions from the reviewers. Conceptualization, W.G. and J.J.G.; Investigation, W.G. and S.P.M.; Writing – Original Draft, W.G.; Writing – Review & Editing, W.G. S.P.M. J.S. and J.J.G.; Funding Acquisition, J.J.G.; Resources, J.J.G.; Supervision, J.S. and J.J.G.
Funding Information:
This work was supported by the NIH through grant R01-GM078221 . We thank Dr. Justin S. Smith at the Center for Nonlinear Studies at Los Alamos National Laboratory, NM, for helpful discussion and Dr. Andrew D. White at the Department of Chemical Engineering at University of Rochester, NY, and Alexander Rives at the Department of Computer Science at New York University, NY, for helpful suggestions. We are also grateful for insightful suggestions from the reviewers.
Publisher Copyright:
© 2020 The Authors
PY - 2020/12/11
Y1 - 2020/12/11
N2 - Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the “sequence → structure → function” paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques. Proteins are linear polymers that fold into an incredible variety of three-dimensional structures that enable sophisticated functionality for biology. Computational modeling allows scientists to predict the three-dimensional structure of proteins from genomes, predict properties or behavior of a protein, and even modify or design new proteins for a desired function. Advances in machine learning, especially deep learning, are catalyzing a revolution in the paradigm of scientific research. In this review, we summarize recent work in applying deep learning techniques to tackle problems in protein structural modeling and design. Some deep learning-based approaches, especially in structure prediction, now outperform conventional methods, often in combination with higher-resolution physical modeling. Challenges remain in experimental validation, benchmarking, leveraging known physics and interpreting models, and extending to other biomolecules and contexts. Proteins fold into an incredible variety of three-dimensional structures to enable sophisticated functionality in biology. Advances in machine learning, especially in deep learning-related techniques, have opened up new avenues in many areas of protein modeling and design. This review dissects the emerging approaches and discusses advances and challenges that must be addressed.
AB - Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the “sequence → structure → function” paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques. Proteins are linear polymers that fold into an incredible variety of three-dimensional structures that enable sophisticated functionality for biology. Computational modeling allows scientists to predict the three-dimensional structure of proteins from genomes, predict properties or behavior of a protein, and even modify or design new proteins for a desired function. Advances in machine learning, especially deep learning, are catalyzing a revolution in the paradigm of scientific research. In this review, we summarize recent work in applying deep learning techniques to tackle problems in protein structural modeling and design. Some deep learning-based approaches, especially in structure prediction, now outperform conventional methods, often in combination with higher-resolution physical modeling. Challenges remain in experimental validation, benchmarking, leveraging known physics and interpreting models, and extending to other biomolecules and contexts. Proteins fold into an incredible variety of three-dimensional structures to enable sophisticated functionality in biology. Advances in machine learning, especially in deep learning-related techniques, have opened up new avenues in many areas of protein modeling and design. This review dissects the emerging approaches and discusses advances and challenges that must be addressed.
KW - DSML 3: Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems
KW - deep generative model
KW - deep learning
KW - protein design
KW - protein folding
KW - representation learning
UR - http://www.scopus.com/inward/record.url?scp=85096850058&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85096850058&partnerID=8YFLogxK
U2 - 10.1016/j.patter.2020.100142
DO - 10.1016/j.patter.2020.100142
M3 - Review article
C2 - 33336200
AN - SCOPUS:85096850058
VL - 1
JO - Patterns
JF - Patterns
SN - 2666-3899
IS - 9
M1 - 100142
ER -