Risk-sensitive Reinforcement Learning via Policy Gradient Search

Upcoming tutorial at AAAI, 2023

Tutorial Description

The objective in traditional reinforcement learning (RL) usually involves an expected value of a cost function that doesn't include risk considerations. In this tutorial, we consider risk-sensitive RL in two settings: one where the goal is to find a policy that optimizes the usual expected value objective while ensuring that a risk constraint is satisfied, and the other where the risk measure is the objective. We focus on policy gradient search as the solution approach.

Thus, the main purpose of this tutorial is to introduce and survey research results on policy gradient methods for reinforcement learning with risk-sensitive criteria, as well as to outline some promising avenues for future research following the risk-sensitive RL framework.

Tutorial Outline

  • Tutorial Overview

  • Review of MDPs/RL

  • Risk Measures

  • Background

  • Policy Gradient Templates for Risk-sensitive RL

  • MDPs with Risk as the Constraint

  • MDPs with Risk as the Objective

Slides

Coming soon

Presenters

Prashanth L.A. and Michael Fu will be the presenters of this tutorial.

Target Audience

The target audience includes both researchers and practitioners who study and/or use reinforcement learning (RL) in their work and who wish to incorporate risk measures or behavioral considerations in their decision-making process. The background needed for this tutorial can be found in a a first course in RL and optimization.

Speaker Bios

profile pic

Prashanth L.A. is an Assistant Professor in the Department of Computer Science and Engineering at Indian Institute of Technology Madras. His research interests are in reinforcement learning, stochastic optimization and multi-armed bandits, with applications in transportation systems, wireless networks and recommendation systems.

profile pic

Michael C. Fu holds the Smith Chair of Management Science at the University of Maryland. His research interests include simulation optimization and applied probability, particularly with applications towards supply chain management and financial engineering.