Mon.2 13:45–15:00 | H 0110 | STO
.

Stochastic Approximation and Reinforcement Learning (2/3)

Chair: Alec Koppel Organizer: Alec Koppel
13:45

Stephen Tu

joint work with Benjamin Recht

The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint

We study the sample complexity of popular model-based and model-free algorithms on the Linear Quadratic Regulator (LQR). For policy evaluation, we show that a simple model-based plugin method requires asymptotically less samples than least-squares temporal difference (LSTD) to reach similar performance; the sample complexity gap between the two methods is at least a factor of state dimension. For simple problem instances, nominal (certainty equivalence principle) control requires several factors of state and input dimension fewer samples than policy gradient method for similar performance.

14:10

Hoi-To Wai

joint work with Belhal Karimi, Błażej Miasojedow, Eric Moulines

Non-asymptotic Rate for Biased Stochastic Approximation Scheme

In this talk, we analyze a general stochastic approximation scheme to minimize a non-convex, smooth objective function. We consider an update procedure whose drift term depends on a state-dependent Markov chain and the mean field is not necessarily of gradient type, therefore it is a biased SA scheme. Importantly, we provide the first non-asymptotic convergence rate under such dynamical setting. We illustrate these settings with the policy-gradient method for average reward maximization, highlighting its tradeoff in performance between the bias and variance.