Structure in the space of value functions

David Foster; Peter Dayan

doi:10.1023/A:1017944732463

Structure in the space of value functions

David Foster, Peter Dayan

Research output: Contribution to journal › Article › peer-review

34 Scopus citations

Abstract

Solving in an efficient manner many different optimal control tasks within the same underlying environment requires decomposing the environment into its computationally elemental fragments. We suggest how to find fragmentations using unsupervised, mixtures model, learning methods on data derived from optimal value functions for multiple tasks, and show that these fragmentations are in accord with observable structures in the environments. Further, we present evidence that such fragments can be of use in a practical reinforcement learning context, by facilitating online, actor-critic learning of multiple goals MDPs.

Original language	English (US)
Pages (from-to)	325-346
Number of pages	22
Journal	Machine Learning
Volume	49
Issue number	2-3
DOIs	https://doi.org/10.1023/A:1017944732463
State	Published - Nov 2002
Externally published	Yes

Keywords

Density estimation
Dynamic programming
Mixture models
Reinforcement learning
Unsupervised learning
Value functions

ASJC Scopus subject areas

Software
Artificial Intelligence

Access to Document

10.1023/A:1017944732463

Cite this

@article{57a9554a276d48ffa28d1db5af6fa4a5,

title = "Structure in the space of value functions",

abstract = "Solving in an efficient manner many different optimal control tasks within the same underlying environment requires decomposing the environment into its computationally elemental fragments. We suggest how to find fragmentations using unsupervised, mixtures model, learning methods on data derived from optimal value functions for multiple tasks, and show that these fragmentations are in accord with observable structures in the environments. Further, we present evidence that such fragments can be of use in a practical reinforcement learning context, by facilitating online, actor-critic learning of multiple goals MDPs.",

keywords = "Density estimation, Dynamic programming, Mixture models, Reinforcement learning, Unsupervised learning, Value functions",

author = "David Foster and Peter Dayan",

note = "Funding Information: We are most grateful to Satinder Singh for critical formative discussions about the idea of unsupervised learning in the space of value functions and helpful suggestions on earlier versions of the paper. We also thank three anonymous reviewers for their most helpful comments and for pointing us to the related work of Drummond (1998). Funding was from the Gatsby Charitable Foundation. The present address of DJF is Department of Brain and Cognitive Sciences, E18-366, Massachusetts Institute of Technology, Cambridge, MA 02139.",

year = "2002",

month = nov,

doi = "10.1023/A:1017944732463",

language = "English (US)",

volume = "49",

pages = "325--346",

journal = "Machine Learning",

issn = "0885-6125",

publisher = "Springer Netherlands",

number = "2-3",

}

TY - JOUR

T1 - Structure in the space of value functions

AU - Foster, David

AU - Dayan, Peter

N1 - Funding Information: We are most grateful to Satinder Singh for critical formative discussions about the idea of unsupervised learning in the space of value functions and helpful suggestions on earlier versions of the paper. We also thank three anonymous reviewers for their most helpful comments and for pointing us to the related work of Drummond (1998). Funding was from the Gatsby Charitable Foundation. The present address of DJF is Department of Brain and Cognitive Sciences, E18-366, Massachusetts Institute of Technology, Cambridge, MA 02139.

PY - 2002/11

Y1 - 2002/11

N2 - Solving in an efficient manner many different optimal control tasks within the same underlying environment requires decomposing the environment into its computationally elemental fragments. We suggest how to find fragmentations using unsupervised, mixtures model, learning methods on data derived from optimal value functions for multiple tasks, and show that these fragmentations are in accord with observable structures in the environments. Further, we present evidence that such fragments can be of use in a practical reinforcement learning context, by facilitating online, actor-critic learning of multiple goals MDPs.

AB - Solving in an efficient manner many different optimal control tasks within the same underlying environment requires decomposing the environment into its computationally elemental fragments. We suggest how to find fragmentations using unsupervised, mixtures model, learning methods on data derived from optimal value functions for multiple tasks, and show that these fragmentations are in accord with observable structures in the environments. Further, we present evidence that such fragments can be of use in a practical reinforcement learning context, by facilitating online, actor-critic learning of multiple goals MDPs.

KW - Density estimation

KW - Dynamic programming

KW - Mixture models

KW - Reinforcement learning

KW - Unsupervised learning

KW - Value functions

UR - http://www.scopus.com/inward/record.url?scp=0036832959&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036832959&partnerID=8YFLogxK

U2 - 10.1023/A:1017944732463

DO - 10.1023/A:1017944732463

M3 - Article

AN - SCOPUS:0036832959

SN - 0885-6125

VL - 49

SP - 325

EP - 346

JO - Machine Learning

JF - Machine Learning

IS - 2-3

ER -

Structure in the space of value functions

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this