Deep Learning in Protein Structural Modeling and Design

Wenhao Gao, Sai Pooja Mahajan, Jeremias Sulam, Jeffrey J. Gray

Research output: Contribution to journalReview articlepeer-review

Abstract

Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the “sequence → structure → function” paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques. Proteins are linear polymers that fold into an incredible variety of three-dimensional structures that enable sophisticated functionality for biology. Computational modeling allows scientists to predict the three-dimensional structure of proteins from genomes, predict properties or behavior of a protein, and even modify or design new proteins for a desired function. Advances in machine learning, especially deep learning, are catalyzing a revolution in the paradigm of scientific research. In this review, we summarize recent work in applying deep learning techniques to tackle problems in protein structural modeling and design. Some deep learning-based approaches, especially in structure prediction, now outperform conventional methods, often in combination with higher-resolution physical modeling. Challenges remain in experimental validation, benchmarking, leveraging known physics and interpreting models, and extending to other biomolecules and contexts. Proteins fold into an incredible variety of three-dimensional structures to enable sophisticated functionality in biology. Advances in machine learning, especially in deep learning-related techniques, have opened up new avenues in many areas of protein modeling and design. This review dissects the emerging approaches and discusses advances and challenges that must be addressed.

Original languageEnglish (US)
Article number100142
JournalPatterns
Volume1
Issue number9
DOIs
StatePublished - Dec 11 2020

Keywords

  • DSML 3: Development/Pre-production: Data science output has been rolled out/validated across multiple domains/problems
  • deep generative model
  • deep learning
  • protein design
  • protein folding
  • representation learning

ASJC Scopus subject areas

  • Decision Sciences(all)

Fingerprint Dive into the research topics of 'Deep Learning in Protein Structural Modeling and Design'. Together they form a unique fingerprint.

Cite this