CS6046: Multi-armed bandits

Course information

  • When: Jan-May 2019

  • Lectures: Slot F

  • Where: CS26

  • Office Hours: Wednesday, 2:30pm to 3:30pm

Course Content

Exploration-exploitation dilemma in stochastic K-armed bandits

  • Explore-then-commit and epsilon-greedy strategies

  • Upper Confidence Bound (UCB) algorithm

  • Thompson sampling (TS) algorithm

  • Regret upper bounds for UCB and TS

  • Lower bounds: a) Information theoretic bounds on minimax error, b) Instance-dependent regret lower bounds

Pure exploration in K-armed bandits

  • Uniform sampling

  • Successive rejects algorithm

  • Lower bounds

Adversarial bandits

  • EXP3, EXP3-IX, EXP4 algorithms

  • Upper bounds on regret of EXP3 and its variants

  • Regret lower bounds

Advanced topics

  • Stochastic linear bandits: Ellipsoidal confidence sets, UCB, Thompson sampling and lower bound.

  • Adversarial linear bandits: Exp2 with John’s exploration, online mirror descent, lower bound.


  • Project: 35%

  • Mid-sem: 20%

  • Homeworks: 40%

  • Paper presentation: 5%

Important Dates

Available on Submission by
Homework 1 Feb 4 Feb 16
Mid-sem Mar 10
Homework 3 Mar 15 Mar 31
Project proposal Apr 2
Project report Apr 30

Paper presentation

Each individual student is expected to read a paper from the list here and present the main contributions of the chosen paper and (hopefully) get the class interested enough to discuss more on the presented paper. The paper presentation shall happen in the regular class hours during the month of April.

Course project

  • The project could be theoretical and/or practical, i.e., the course project could involve implementation of existing bandit algorithms on a sophisticated benchmark (e.g. recommendation systems) and/or original research in the form of a novel bandit formulation or extension of existing theoretical results. Given the limited timeframe of the course, working out a new result or a sufficiently novel extension of an existing result may not be completely feasible and in that case, the project report could turn into a survey of existing literature that the students familiarized with, while attempting to formulate/solve a bandit problem.

  • For theoretically-oriented projects, the list of papers given here can be a good place to start exploring. For the practically-oriented ones, some pointers are available here.

  • For theoretically-oriented projects, the report is expected to provide a detailed description of the problem and accompanying algorithms. For the projects geared towards performance evaluation of existing bandit algorithms, the report is expected to summarize the findings of the empirical investigation of state-of-the-art bandit algorithms in one of the application domains.

The evaluation would be as follows: 15% for the project proposal, 25% for the final report and 10% for an oral examination by the instructor.


Schedule (from 2018)

Regret minimization in MAB (Click here for course notes)
Lecture 1 Course Organization
Motivation for studying bandits
Lecture 2 Bandit learning framework
Regret definition, equivalent expression using gaps
Lecture 3 Explore-then-commit (ETC) algorithm
a brief tour of concentration inequalities: Subgaussianity
Lecture 4 Finite sample bounds using Chernoff method
end of concentration inequalities tour
Lecture 5 Gap dependent regret bound for ETC 1.4
Lecture 6 Gap-independent regret bound for ETC 1.4
Lecture 7 UCB algorithm: introduction 1.5
Lecture 8 UCB: gap-dependent regret bound 1.5
Lecture 9 UCB: gap-independent regret bound 1.5
Lecture 11 A brief tour of information theory:
Entropy, joint and conditional entropy, chain rule
Lecture 12 KL-divergence: Definition, non-negativity, chain rule 1.6
Lecture 13 Pinsker's inequality, end of information theory tour 1.6
Lecture 14 Regret lower bound for 2-armed bandits 1.7
Lecture 15 Regret lower bound for K>2 arms 1.7
Lecture 16 Instance dependent (asymptotic) lower bound 1.7
Lecture 17 A brief tour of Bayesian statistics:
Parameter estimation (Bernoulli case)
Lecture 18 Parameter estimation - case of Normal mean 1.8
Lecture 19 Bayesian bandit framework and Bayesian regret
Thompson sampling aka posterior sampling
Lecture 19 Thompson sampling: Bayesian regret bound 1.9
Lecture 20 Thompson sampling: Bayesian regret bound 1.9
Lecture 21 Brief introduction to KL-UCB, UCB-V, UCB-improved, deviations of regret
Lecture 22 Brief introduction to frequentist regret bounds of Thompson sampling
Best arm identification in MAB
Lecture 1 Best arm identification (BAI):
Fixed budget setting
Uniform sampling
Lecture 2 Successive rejects (SR) algorithm 2.2
Lecture 3 Analysis of error probability of SR 2.2
Lecture 4 Lower bound for BAI in fixed budget setting 2.3
Lecture 5 Lower bound for BAI in fixed budget setting 2.3
Lecture 6 Lower bound for BAI in fixed budget setting 2.3
Lecture 7 Lower bound for BAI in fixed budget setting 2.3
Lecture 8 Fixed confidence BAI setting
Naive uniformly exploring algorithm
Lecture 9 Successive elimination with known gaps 2.5
Lecture 9 Successive elimination with unknown gaps 2.5
Lecture 10 Analysis of successive elimination algorithm 2.5
Lecture 11 Lower bound for fixed confidence BAI 2.6
Lecture 12 Median elimination and a few recent algorithms
(Stochastic) Linear bandits
Lecture 1 Linear bandit formulation, Confidence ellipsoid
Lecture 2 LinUCB algorithm
Lecture 3 Sketch of regret bound
Adversarial bandits
Lecture 1 Prediction with expert advice
Halving algorithm, mistake bound
Lecture 2 Weighted majority algorithm, mistake bound
Lecture 3 Exponential weighted majority algorithm
Lecture 4 Regret bound for EWM algorithm
Lecture 5 Adversarial bandits: framework
EXP3 algorithm
Lecture 6 Regret bound for EXP3 algorithm
Lecture 7 EXP3.P algorithm, high-probability regret bound
Lecture 8 EXP3-IX algorithm, high-probability regret bound
Lecture 9 A survey of other adversarial bandit settings