Course Details

CS6046 - Multi-armed bandits

Course Data :

Course Objective

The objective of this course is to cover basic topics in multi-armed bandits, in particular emphasizing the popular exploration-exploitation dilemma as well as the best arm identification (or pure exploration) paradigms. Bulk of the course will be geared towards familiarizing the student with the rigorous mathematical foundations of a variety of popular bandit models in the finite armed setting, while the later part of the course will cover advanced topics such as linear bandit models, where the underlying arms are in a large (possibly infinite) set.

Course Contents:

Part 0: Recap of concentration inequalities (Lectures 1-3)

Part I: Exploration-exploitation dilemma in stochastic K-armed bandits (Lectures 4 â€“ 15)

Explore-then-commit and Îµ-greedy strategies
Upper Confidence Bound (UCB) algorithm
Thompson sampling (TS) algorithm
Regret upper bounds for UCB and TS
Lower bounds: a) Information theoretic bounds on minimax error, b) Instance-dependent regret lower bounds

Part II: Pure exploration in K-armed bandits (Lectures 16-18)

Uniform sampling
Successive rejects algorithm
Upper bound on probability of error
Lower bound on probability of error

Part III: Adversarial bandits (Lectures 19-24)

EXP3, EXP3-IX, EXP4 algorithms
Upper bounds on regret of EXP3 and its variants
Regret lower bounds

Part IV: Advanced topics

Stochastic linear bandits (Lectures 25-30): Ellipsoidal confidence sets, UCB, Thompson sampling and lower bound.
Adversarial linear bandits (Lectures 31-39): Exp2 with Johnâ€™s exploration, online mirror descent, lower bound.

Text Book:

SÃ©bastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 2012.

References:

Csaba SzepesvÃ¡ri and Tor Lattimore. http://banditalgs.com/. Blog posts on bandit theory1.
Nicolo Cesa-Bianchi and GÃ¡bor Lugosi. Prediction, learning, and games. Cambridge university press, 2006.

Learning outcomes:

Develop a rigorous understanding of mathematical foundations of classic bandit algorithms.
Implement bandit algorithms and correlate the empirical findings with the bandit theory learned from the course.
Develop familiarity with state-of-the-art bandit models and/or contribute original research through project work.

Pre-Requisites

None

Parameters

Credits	Type	Date of Introduction
3-1-0-0-0-8-12	Elective	Aug 2017

Previous Instances of the Course

Jul 2023 - Nov 2023
Instructor(s) : R. Prema.

Jan 2023 - May 2023
Instructor(s) : Chandrashekar Lakshminarayanan.

Jan 2022 - Apr 2022
Instructor(s) : C. Chandra Sekhar.
Teaching Assistants : Yadav Mahesh Lorik, Jude K Anil.

Feb 2021 - May 2021
Instructor(s) : Arun Rajkumar.
Teaching Assistants : Sudha S.

Jan 2020 - May 2020
Instructor(s) : Arun Rajkumar.
Teaching Assistants : Abhijeet Nijampurkar, Pranjal Kanyal, Vijayaragunathan.

Jan 2019 - May 2019
Instructor(s) : L A Prashanth.
Teaching Assistants : Nithia V.

Jan 2018 - May 2018
Instructor(s) : L A Prashanth.

Jan 2018 - May 2018
Teaching Assistants : Om Prakash.

Department of Computer Science & Engineering

Indian Institute of Technology Madras, Chennai, India.