TY - JOUR
T1 - Gradients do grow on trees
T2 - A linear-time O(N)-dimensional gradient for statistical phylogenetics
AU - Ji, Xiang
AU - Zhang, Zhenyu
AU - Holbrook, Andrew
AU - Nishimura, Akihiko
AU - Baele, Guy
AU - Rambaut, Andrew
AU - Lemey, Philippe
AU - Suchard, Marc A.
N1 - Publisher Copyright:
© 2020 The Author(s).
PY - 2020/10/1
Y1 - 2020/10/1
N2 - Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order O(N)-dimensional gradient calculations based on the standard pruning algorithm require O(N2) operations, wheres N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend toward even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make these analyses tractable, we present a linear-time algorithm for O(N)-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus, and Lassa virus. Our proposed algorithmsignificantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.
AB - Calculation of the log-likelihood stands as the computational bottleneck for many statistical phylogenetic algorithms. Even worse is its gradient evaluation, often used to target regions of high probability. Order O(N)-dimensional gradient calculations based on the standard pruning algorithm require O(N2) operations, wheres N is the number of sampled molecular sequences. With the advent of high-throughput sequencing, recent phylogenetic studies have analyzed hundreds to thousands of sequences, with an apparent trend toward even larger data sets as a result of advancing technology. Such large-scale analyses challenge phylogenetic reconstruction by requiring inference on larger sets of process parameters to model the increasing data heterogeneity. To make these analyses tractable, we present a linear-time algorithm for O(N)-dimensional gradient evaluation and apply it to general continuous-time Markov processes of sequence substitution on a phylogenetic tree without a need to assume either stationarity or reversibility. We apply this approach to learn the branch-specific evolutionary rates of three pathogenic viruses: West Nile virus, Dengue virus, and Lassa virus. Our proposed algorithmsignificantly improves inference efficiency with a 126- to 234-fold increase in maximum-likelihood optimization and a 16- to 33-fold computational performance increase in a Bayesian framework.
KW - Bayesian inference
KW - Linear-time gradient algorithm
KW - Maximum likelihood
KW - Random-effects molecular clock model
UR - http://www.scopus.com/inward/record.url?scp=85092444332&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092444332&partnerID=8YFLogxK
U2 - 10.1093/molbev/msaa130
DO - 10.1093/molbev/msaa130
M3 - Article
C2 - 32458974
AN - SCOPUS:85092444332
SN - 0737-4038
VL - 37
SP - 3047
EP - 3060
JO - Molecular biology and evolution
JF - Molecular biology and evolution
IS - 10
ER -