Markov decision processes with slow scale periodic decisions

M. Jacobson; N. Shimkin; A. Shwartz

doi:10.1287/moor.28.4.777.20517

Markov decision processes with slow scale periodic decisions

M. Jacobson, N. Shimkin, A. Shwartz

Research output: Contribution to journal › Article › peer-review

6 Scopus citations

Abstract

We consider a class of discrete time, dynamic decision-making models which we refer to as Periodically Time-Inhomogeneous Markov Decision Processes (PTMDPs). In these models, the decision-making horizon can be partitioned into intervals, called slow scale cycles, of N + 1 epochs. The transition law and reward function are time-homogeneous over the first N epochs of each slow scale cycle, but distinct at the final epoch. The motivation for such models is in applications where decisions of different nature are taken at different time scales, i.e., many "low-level" decisions are made between less frequent "high-level" ones. For the PTMDP model, we consider the problem of optimizing the expected discounted reward when rewards devalue by a discount factor λ at the beginning of each slow scale cycle. When N is large, initially stationary policies (i.s.p.'s) are natural candidates for optimal policies. Similar to turnpike policies, an initially stationary policy uses the same decision rule for some large number of epochs in each slow scale cycle, followed by a relatively short planning horizon of time-varying decision rules. In this paper, we characterize the form of the optimal value as a function of N, establish conditions ensuring the existence of near-optimal i.s.p.'s, and characterize their structure. Our analysis deals separately with the cases where the time-homogeneous part of the system has state-dependent and state-independent optimal average reward. As we illustrate, the results in these two distinct cases are qualitatively different.

Original language	English (US)
Pages (from-to)	777-800
Number of pages	24
Journal	Mathematics of Operations Research
Volume	28
Issue number	4
DOIs	https://doi.org/10.1287/moor.28.4.777.20517
State	Published - Nov 2003
Externally published	Yes

Keywords

Cyclo-stationary
Discounted cost
Multi-class models
Multiple time scales
Periodic time-inhomogeneity
Turnpike

ASJC Scopus subject areas

General Mathematics
Computer Science Applications
Management Science and Operations Research

Access to Document

10.1287/moor.28.4.777.20517

Cite this

@article{66889096fd9e488e88dc4cc48842ba39,

title = "Markov decision processes with slow scale periodic decisions",

abstract = "We consider a class of discrete time, dynamic decision-making models which we refer to as Periodically Time-Inhomogeneous Markov Decision Processes (PTMDPs). In these models, the decision-making horizon can be partitioned into intervals, called slow scale cycles, of N + 1 epochs. The transition law and reward function are time-homogeneous over the first N epochs of each slow scale cycle, but distinct at the final epoch. The motivation for such models is in applications where decisions of different nature are taken at different time scales, i.e., many {"}low-level{"} decisions are made between less frequent {"}high-level{"} ones. For the PTMDP model, we consider the problem of optimizing the expected discounted reward when rewards devalue by a discount factor λ at the beginning of each slow scale cycle. When N is large, initially stationary policies (i.s.p.'s) are natural candidates for optimal policies. Similar to turnpike policies, an initially stationary policy uses the same decision rule for some large number of epochs in each slow scale cycle, followed by a relatively short planning horizon of time-varying decision rules. In this paper, we characterize the form of the optimal value as a function of N, establish conditions ensuring the existence of near-optimal i.s.p.'s, and characterize their structure. Our analysis deals separately with the cases where the time-homogeneous part of the system has state-dependent and state-independent optimal average reward. As we illustrate, the results in these two distinct cases are qualitatively different.",

keywords = "Cyclo-stationary, Discounted cost, Multi-class models, Multiple time scales, Periodic time-inhomogeneity, Turnpike",

author = "M. Jacobson and N. Shimkin and A. Shwartz",

year = "2003",

month = nov,

doi = "10.1287/moor.28.4.777.20517",

language = "English (US)",

volume = "28",

pages = "777--800",

journal = "Mathematics of Operations Research",

issn = "0364-765X",

publisher = "INFORMS Inst.for Operations Res.and the Management Sciences",

number = "4",

}

TY - JOUR

T1 - Markov decision processes with slow scale periodic decisions

AU - Jacobson, M.

AU - Shimkin, N.

AU - Shwartz, A.

PY - 2003/11

Y1 - 2003/11

N2 - We consider a class of discrete time, dynamic decision-making models which we refer to as Periodically Time-Inhomogeneous Markov Decision Processes (PTMDPs). In these models, the decision-making horizon can be partitioned into intervals, called slow scale cycles, of N + 1 epochs. The transition law and reward function are time-homogeneous over the first N epochs of each slow scale cycle, but distinct at the final epoch. The motivation for such models is in applications where decisions of different nature are taken at different time scales, i.e., many "low-level" decisions are made between less frequent "high-level" ones. For the PTMDP model, we consider the problem of optimizing the expected discounted reward when rewards devalue by a discount factor λ at the beginning of each slow scale cycle. When N is large, initially stationary policies (i.s.p.'s) are natural candidates for optimal policies. Similar to turnpike policies, an initially stationary policy uses the same decision rule for some large number of epochs in each slow scale cycle, followed by a relatively short planning horizon of time-varying decision rules. In this paper, we characterize the form of the optimal value as a function of N, establish conditions ensuring the existence of near-optimal i.s.p.'s, and characterize their structure. Our analysis deals separately with the cases where the time-homogeneous part of the system has state-dependent and state-independent optimal average reward. As we illustrate, the results in these two distinct cases are qualitatively different.

AB - We consider a class of discrete time, dynamic decision-making models which we refer to as Periodically Time-Inhomogeneous Markov Decision Processes (PTMDPs). In these models, the decision-making horizon can be partitioned into intervals, called slow scale cycles, of N + 1 epochs. The transition law and reward function are time-homogeneous over the first N epochs of each slow scale cycle, but distinct at the final epoch. The motivation for such models is in applications where decisions of different nature are taken at different time scales, i.e., many "low-level" decisions are made between less frequent "high-level" ones. For the PTMDP model, we consider the problem of optimizing the expected discounted reward when rewards devalue by a discount factor λ at the beginning of each slow scale cycle. When N is large, initially stationary policies (i.s.p.'s) are natural candidates for optimal policies. Similar to turnpike policies, an initially stationary policy uses the same decision rule for some large number of epochs in each slow scale cycle, followed by a relatively short planning horizon of time-varying decision rules. In this paper, we characterize the form of the optimal value as a function of N, establish conditions ensuring the existence of near-optimal i.s.p.'s, and characterize their structure. Our analysis deals separately with the cases where the time-homogeneous part of the system has state-dependent and state-independent optimal average reward. As we illustrate, the results in these two distinct cases are qualitatively different.

KW - Cyclo-stationary

KW - Discounted cost

KW - Multi-class models

KW - Multiple time scales

KW - Periodic time-inhomogeneity

KW - Turnpike

UR - http://www.scopus.com/inward/record.url?scp=3142750477&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=3142750477&partnerID=8YFLogxK

U2 - 10.1287/moor.28.4.777.20517

DO - 10.1287/moor.28.4.777.20517

M3 - Article

AN - SCOPUS:3142750477

SN - 0364-765X

VL - 28

SP - 777

EP - 800

JO - Mathematics of Operations Research

JF - Mathematics of Operations Research

IS - 4

ER -

Markov decision processes with slow scale periodic decisions

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this