|Title||:||Understanding Exploration Strategies in Model Based Reinforcement Learning|
|Speaker||:||Prasanna P (IITM)|
|Details||:||Tue, 7 Jun, 2016 2:00 PM @ BSB 361|
|Abstract:||:||This seminar addresses exploration problem in Reinforcement Learning (RL) from two possible perspectives of learning agents -- asymptotic and finite-episode. Learning agent, in a RL problem, is assumed to exist in an unknown environment, where it is expected to learn an optimal behavior that yields the maximum pay-off.
As the environment is unknown and the objective is to maximize the pay off, a learning agent is entangled in the dilemma of explore vs exploit. This is signature of RL approach. Most of the model learning approaches addresses this dilemma with a different exploration strategy that provides tighter learning guarantees.
This seminar discuses on one other novel value-based exploration strategy, Thompson Sampling with Exploration Bonus (TSEB) for an asymptotic agent. This work provides an intuitive exploration strategy and analyze its theoretical guarantees. This approach provides a tighter PAC-guarantee than the existing model learning approaches which can be extended to Thompson Sampling (TS).
As existing model learning approaches including TSEB is conditioned to converge to the optimum only asymptotically. To address the need for finite-time strategies, we extend and adopt the asymptotic solutions for a finite-episode and finite-lifetime RL agent. This seminar discusses elaborately on the nature of the finite-episode problem and provides a framework that works over a chosen asymptotic algorithm. This work addressed as Structuring Finite-Episode Exploration in Markov Decision Processes (SFTE) discusses the convergence properties and learning guarantees that can be exported from the chosen asymptotic algorithm.
Further the seminar supports the two approaches with experiments on simulated worlds comparing against the other approaches. To conclude, this seminar discusses on the challenges and the scope for further work on the problems.