CS6515: Stochastic optimization

Course information

When: Jan-May 2023
Lectures: Slot J
Where: ESB207B
Teaching Assistants: None

Course Content

Module 1: Introduction

Examples: Least squares, matrix factorization, SVM, logistic regression, MCMC, nonlinear urns, RL
Preliminaries: Selected topics in probability and linear algebra

Module 2: Foundations - Convexity and smoothness

Taylor's theorem, minima of smooth functions
Convex sets and functions

Module 3: Foundations - Stochastic approximation

ODE approach and asymptotic convergence

Module 4: Descent methods

Descent directions
Convergence and rates

Module 5: Stochastic gradient algorithms

Convergence analysis for convex case
Sum of smooth and strongly convex functions
Convegence for general case
Noise reduction, e.g. SVRG, SAG and iterate averaging
Second-order methods

Module 6: Zeroth order optimization

Gradient estimation
Convergence analysis

Module 7: Other topics (if time permits)

Fixed point iterations, RL, MCMC, co-ordinate descent

Textbooks/References

[1] Stephen J. Wright, and Benjamin Recht. Optimization for data analysis. Cambridge University Press, 2022.
[2] V.S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Texts and Readings in Mathematics, 2022.
[3] Sebastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 2015.
[4] Stephen P. Boyd, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

Grading

Mid-term exam: 25%
Final exam and viva: 20%
Project: 25%
Paper review: 5%
Scribing: 10%
Mini-quizzes: 15%

Important Dates

Mini-quizzes: Feb 27, Apr 3, Apr 17
Mid-term: Mar 9
Final: May 9
Project: Apr 27
Paper review: Apr 10

Paper review

Students form a team of size at most two, and read a paper on stochastic optimization, either from a list (to be shared) or a paper outside the list that has the instructor's approval. Each team submits a 1-page critical review that identifies the strengths and areas of improvement of the paper studied. A paraphrase of the paper content is strongly discouraged.

Course project

The students work in teams of size at most two.
The project could be theoretical and/or practical, i.e., the course project could involve implementation of existing stochastic optimization algorithms and/or original research in the form of either a novel stochastic optimization algorithm or extension of existing theoretical results for such algorithms. Given the limited timeframe of the course, working out a new result or a sufficiently novel extension of an existing result may not be completely feasible and in that case, the project report could turn into a survey of existing literature that the students familiarized with, while attempting to formulate/solve a stochastic optimization problem.
For theoretically-oriented projects, the report is expected to provide a detailed description of the problem, accompanying algorithms and convergence analysis. For the projects geared towards performance evaluation of existing stochastic optimization algorithms, the report is expected to summarize the findings of the empirical investigation, with sufficient implementation details that would help reproduce the results.

The evaluation would be as follows: 5% for the project proposal, 10% for the final report and 10% for a project presentation/viva by the instructor.

Schedule

Lecture number	Topics Covered	Reference
Lecture 1	Motivating examples: Linear models in ML	Chapter 1 of [1] and my notes here
Lecture 2	Motivating examples: Max-likelihood estimation	My notes here
Lecture 3	Motivating examples: Logistic regression	My notes here
Lecture 4	Motivating examples: Zeroth-order optimization	Slides here
Lecture 5	Gradient estimation using simultaneous perturbation method	Slides here
Lecture 6	A unified gradient estimate and its bias/variance analysis	Slides here
Lecture 7	Foundations of smooth optimization: local/global optima Necessary conditions for local minima	Chapter 2 of [1]
Lecture 8	Taylor's theorem, smooth functions	Chapter 2 of [1]
Lecture 9	Sufficient conditions for local minima	Chapter 2 of [1]
Lecture 10	Convex sets and functions	Chapter 2 of [1] and Chapters 2,3 of [4]
Lecture 11	Characterization of convex functions	Chapter 2 of [1] and Chapters 2,3 of [4]
Lecture 12	Strongly convex functions	Chapter 2 of [1]
Mini-quiz 1
Lecture 13	Convergence of random variables	Lec 28 of EE5110 course on probability by Prof. Krishna Jagannathan, click here
Lecture 14	Convergence of random variables Conditional expectation	Sec 3.7 of Grimmett and Stirzaker's book: Probability and random processes
Lecture 15	Conditional expectation	Sec 4.6 of Grimmett and Stirzaker's book: Probability and random processes
Lecture 16	Martingales: Introduction	Sec 7.7 of Grimmett and Stirzaker's book: Probability and random processes
Lecture 17	Martingales: Applications from stochastic approximation
Lecture 18	Martingales: maximal inequality	Sec 7.8 of Grimmett and Stirzaker's book: Probability and random processes
Lecture 19	Problem solving session on smoothness, convexity, and topics in probability
Mid-term exam
Lecture 20	Discussion of mini-quiz 1 and mid-term problems
Lecture 21	Martingales: convergence theorem	Sec 7.8 of Grimmett and Stirzaker's book: Probability and random processes
Lecture 22	Stochastic approximation: introduction	Chapter 1 of [2]
Lecture 23	Stochastic approximation: Asymptotic convergence	Chapter 2 of [2]
Mini-quiz 2
Lecture 24	Stochastic approximation: Applications
Lecture 25	Discussion of mini-quiz 2 Non-asymptotics for smooth optimization
Lecture 26	Gradient descent: Analysis for a smooth objective	Chapter 3 of [1]
Lecture 27	Gradient descent: Analysis for a convex/strongly convex objective	Chapter 3 of [1]
Lecture 28	Stochastic gradient algorithm: Non-asymptotic analysis with unbiased gradient estimates	Ghadimi S, Lan G (2013) Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23:2341–2368
Mini-quiz 3
Lecture 29	Stochastic gradient algorithm: Non-asymptotic analysis with biased gradient estimates	Bhavsar N, Prashanth LA (2022) Non-asymptotic bounds for stochastic optimization with biased noisy gradient oracles. IEEE Transactions on Automatic Control, 68:1628-1641
Lecture 30	Stochastic gradient algorithm: Non-asymptotic analysis in the strongly convex case	Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. Siam Review 60(2):223–311
Lecture 31	Variance reduction: SAG, SVRG	Gower, R. M., Schmidt, M., Bach, F., & Richtarik, P. (2020). Variance-reduced methods for machine learning. Proceedings of the IEEE, 108(11), 1968-1983