Online NPTEL Courses
(Classroom Courses Taught)Week 0 - Preparatory Material
Week 1 - Introduction to RL and Immediate RL
Week 2 - Bandit Algorithms
- Assignment 2
- Solution 2
- Auer, P.; Cesa-Bianchi, N.; Fischer, P. 2002. Finite-time Analysis of the Multiarmed Bandit Problem.
- Auer, P.; Ortner, R. 2010. UCB Revisited: Improved Regret Bounds for the Stochastic Multi-Armed Bandit Problem.
- Even-Dar, E.; Mannor, S.; Mansour, Y. 2006. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems.
- Tutorial on OFUL (Szepesvari, C.) Part 1 Part 2 Part 3
Week 3 - Policy Gradient Methods & Introduction to Full RL
Week 4 - MDP Formulation, Bellman Equations & Optimality Proofs
Week 5 - Dynamic Programming & Monte Carlo Methods
Week 6 - Monte Carlo & Temporal Difference Methods
Week 7 - Eligibility Traces
Week 8 - Function Approximation
Week 9 - DQN, Fitted Q & Policy Gradient Approaches
Week 10 - Hierarchical Reinforcement Learning
- Assignment 10
- Solution 10
- Andrew G. Barto and Sridhar Mahadevan. 2003. Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Dynamic Systems 13, 1-2 (January 2003), 41-77. DOI: https://doi.org/10.1023/A:1022140919877
Week 11 - Hierarchical RL: MAXQ
- Assignment 11
- Solution 11
- Dietterich, T. G. 2000. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition.
Week 12 - POMDPs
Reinforcment Learning
Reinforcement learning is a paradigm that aims to model the trial-and-error learning process that is needed in many problem situations where explicit instructive signals are not available. It has roots in operations research, behavioral psychology and AI. The goal of the course is to introduce the basic mathematical foundations of reinforcement learning, as well as highlight some of the recent directions of research.
We provide two alternate ways of streaming the videos
- default
Youtube Player ( )
- feature-rich Videoken Player ( ).
Currently Teaching