Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

Mengdi Wang; Ethan X. Fang; Han Liu

doi:10.1007/s10107-016-1017-3

Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

Mengdi Wang, Ethan X. Fang, Han Liu

Research output: Contribution to journal › Article › peer-review

43 Scopus citations

Abstract

Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., the problem min _xE_v[f_v(E_w[ g_w(x)]) ]. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of f_v, g_w and use an auxiliary variable to track the unknown quantity E_w[g_w(x) ]. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieves a convergence rate of O(k^{-1 / 4}) in the general case and O(k^{-2 / 3}) in the strongly convex case, after taking k samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of O(k^{-2 / 7}) in the general case and O(k^{-4 / 5}) in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.

Original language	English (US)
Pages (from-to)	419-449
Number of pages	31
Journal	Mathematical Programming
Volume	161
Issue number	1-2
DOIs	https://doi.org/10.1007/s10107-016-1017-3
State	Published - Jan 1 2017
Externally published	Yes

Keywords

Convex optimization
Sample complexity
Simulation
Statistical learning
Stochastic gradient
Stochastic optimization

ASJC Scopus subject areas

Software
General Mathematics

Access to Document

10.1007/s10107-016-1017-3

Cite this

@article{800620d11c3744a683cb373642fce17d,

title = "Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions",

abstract = "Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., the problem min xEv[fv(Ew[ gw(x)]) ]. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of fv, gw and use an auxiliary variable to track the unknown quantity Ew[gw(x) ]. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieves a convergence rate of O(k-1 / 4) in the general case and O(k-2 / 3) in the strongly convex case, after taking k samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of O(k-2 / 7) in the general case and O(k-4 / 5) in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.",

keywords = "Convex optimization, Sample complexity, Simulation, Statistical learning, Stochastic gradient, Stochastic optimization",

author = "Mengdi Wang and Fang, {Ethan X.} and Han Liu",

note = "Publisher Copyright: {\textcopyright} 2016, Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society.",

year = "2017",

month = jan,

day = "1",

doi = "10.1007/s10107-016-1017-3",

language = "English (US)",

volume = "161",

pages = "419--449",

journal = "Mathematical Programming",

issn = "0025-5610",

publisher = "Springer-Verlag GmbH and Co. KG",

number = "1-2",

}

TY - JOUR

T1 - Stochastic compositional gradient descent

T2 - algorithms for minimizing compositions of expected-value functions

AU - Wang, Mengdi

AU - Fang, Ethan X.

AU - Liu, Han

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., the problem min xEv[fv(Ew[ gw(x)]) ]. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of fv, gw and use an auxiliary variable to track the unknown quantity Ew[gw(x) ]. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieves a convergence rate of O(k-1 / 4) in the general case and O(k-2 / 3) in the strongly convex case, after taking k samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of O(k-2 / 7) in the general case and O(k-4 / 5) in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.

AB - Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., the problem min xEv[fv(Ew[ gw(x)]) ]. In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of fv, gw and use an auxiliary variable to track the unknown quantity Ew[gw(x) ]. We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieves a convergence rate of O(k-1 / 4) in the general case and O(k-2 / 3) in the strongly convex case, after taking k samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of O(k-2 / 7) in the general case and O(k-4 / 5) in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.

KW - Convex optimization

KW - Sample complexity

KW - Simulation

KW - Statistical learning

KW - Stochastic gradient

KW - Stochastic optimization

UR - http://www.scopus.com/inward/record.url?scp=84966270105&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84966270105&partnerID=8YFLogxK

U2 - 10.1007/s10107-016-1017-3

DO - 10.1007/s10107-016-1017-3

M3 - Article

AN - SCOPUS:84966270105

SN - 0025-5610

VL - 161

SP - 419

EP - 449

JO - Mathematical Programming

JF - Mathematical Programming

IS - 1-2

ER -

Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this