|Title||:||Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control|
|Speaker||:||Prashanth L A (Institute for Systems Research, University of Maryland, USA)|
|Details||:||Fri, 20 May, 2016 3:00 PM @ BSB 361|
|Abstract:||:||Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the entire distribution of the value function and finding a randomized optimal policy. The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA). We provide theoretical convergence guarantees for all the proposed algorithms and also empirically demonstrate the usefulness of our algorithms in a traffic signal control application.
* The majority of the talk will be accessible to a broad audience.
** The contributions will be emphasized in a more general active learning setting, with RL as a special case.
*** Joint work with Cheng Jie, Michael Fu, Steve Marcus and Csaba Szepesvari.
Bio: Prashanth L.A. is currently a postdoctoral researcher at the Institute for Systems Research, University of Maryland - College Park. Prior to this, he was a postdoctoral researcher at INRIA Lille - Team SequeL from 2012 to 2014. From 2002 to 2009, he was with Texas Instruments (India) Pvt Ltd, Bangalore, India.
He received his Masters and Ph.D degrees in Computer Science and Automation from Indian Institute of Science, in 2008 and 2013, respectively. He was awarded the third prize for his Ph.D. dissertation, by the IEEE Intelligent Transportation Systems Society (ITSS). He is the coauthor of a book entitled `Stochastic Recursive Algorithms for Optimization: Simultaneous Perturbation Methods', published by Springer in 2013. His research interests are in reinforcement learning, stochastic optimization and multi-armed bandits, with applications in transportation systems, wireless networks and recommendation systems.