CS5691: Pattern recognition and Machine learning
Course information
When: JanMay 2019
Lectures: Slot E
Where: CS26
Teaching Assistants: Ajay Kumar Pandey, Anubha Pandey, Subhrajit Makur, Omji Gupta, Tapdiya Aarti Shrikishan, Nirav Narharibhai Bhavsar
Course Content
Bayes decision theory and Bayes classifer
Maximum likelihood and Bayesian parameter estimation
Nonparametric density estimation
Linear models
Linear leastsquares regression, logistic regression, regularized least squares, biasvariance tradeoff, Perceptron
Statistical learning theory
PAC learning, empirical risk minimization, uniform convergence and VCdimension
Support vector machines and kernel methods
Ensemble Methods
Bagging, Boosting
Multilayer neural networks
Feedforward networks, backpropagation
Mixture densities and EM algorithm
Clustering
Kmeans, spectral
Dimensionality reduction
Singular value decomposition, principal component analysis
Grading
Midterm: 20%
Final exam: 30%
Quizzes: 15% (Best 3 out of 4)
Programming Assignments: 20 (10 for each assignment)
Programming Contest: 15%
Important Dates
On  

Quiz 1  Feb 1 
Quiz 2  Feb 22 
Midsem  Mar 8 
Quiz 3  Mar 22 
Quiz 4  Apr 12 
Available on  Submission by 


Assignment 1  Feb 8  Mar 1 
Assignment 2  Mar 15  Apr 12 
Programming Contest  Apr 26 
Textbooks
Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification, John Wiley, 2001
Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006
Additional References
Shai ShalevShwartz and Shai BenDavid, Understanding Machine Learning, Cambridge Univ. Press, 2014
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning with Applications in R, Springer, 2014.
Quizzes and Exams
Schedule
Lecture number  Topics Covered  Reference 

Lecture 1  Review of probability: Conditional probability, Bayes theorem  [GS] Sec 1.4, 1.5, 3.1, 3.2 
Lecture 2  Review of probability: Joint distribution, Conditional distribution and expectation  [GS] Sec 3.6. 3.7 
Lecture 3  Review of probability: Bivariate normal distribution  [GS] 4.5,4.6 
Lecture 4  Review of probability: Change of variable, bivariate normal as an example  [GS] 4.7 
Lecture 5  Linear algebra: review  [St] 
Lecture 6  Multivariate normal distribution  [GS] 4.9 
Lecture 7  Introduction to classification and regression, Bayes classifer  [DHS] Chapter 2 
Lecture 8  Bayes classifier, optimality in the twoclass case  [DHS] Chapter 2 
Lecture 9  Adaptation of Bayes classifier to (i) handle loss functions, (ii) multiclass case  [DHS] Chapter 2 
Lecture 10  Bayes classifier: case when class conditional densities are normal brief introduction to discriminant functions 
[DHS] Chapter 2 
Lecture 11  Paramteric density estimation, meansquare error  [DHS] Chapter 3 
Lecture 12  Maximum likelihood (ML) estimation: introduction, and a discrete example  [DHS] Chapter 3 
Lecture 13  ML estimation: Gaussian case  [DHS] Chapter 3 
Lecture 14  Bayesian parameter estimation: Bernoulli case  [DHS] Chapter 3 
Lecture 15  Bayesian parameter estimation: Multinomial and Gaussian cases  [DHS] Chapter 3 
Lecture 16  Linear Regression: introduction  
Lecture 17  Linear Regression: A geometric viewpoint  [Strang] Chapter on orthogonality 
Lecture 18  Maximum likelihood and sum of squares  [B] Section 3.1.1 
Lecture 19  Bias variance tradeoff  [B] Sec 3.2, and Borkar's article here 
Lecture 20  Bias variance tradeoff  [B] Sec 3.2 
Lecture 21  Polynomial regression and regularization  [B] Sec 1.1 
Lecture 22  A crash course on unconstrained optimization: Firstorder optimality conditions 
[BT] Sec 3.2 
Lecture 23  A tour of convexity  
Lecture 24  Gradient descent, LMS algorithm for least squares  [Si] Chapter 3 
Lecture 25  Perceptron algorithm  [Si] Chapter 1 
Lecture 26  Convergence analysis of Perceptron  [Si] Chapter 1 
Lecture 27  Linear models for classification: Probabilistic generative models 
[B] Sec 4.2 
Lecture 28  Probabilistic discriminative models: Logistic regression 
[B] Sec 4.3 
Lecture 29  Iterative reweighted least squares  [B] Sec 4.3 
Lecture 30  Introduction to SVMs  [Sa] Lec 31 
Lecture 31  SVM: Linearly separable case  [Sa] Lec 31, 32 
Lecture 32  SVM: Nonlinearly separable case  [SS] Chapter 7 
Lecture 33  The Kernel trick  [Sa] Lec 33 
Lecture 34  Kernel regression  [Sa] Lec 34 
Lecture 35  Neural networks: Introduction  [Sa] Lec 25 
Lecture 36  Backpropagation  [Sa] Lec 26 
Lecture 37  Gaussian mixture models and ML estimation  [B] Sec 9.2 
Lecture 38  EM algorithm  [B] Sec 9.3 
Lecture 39  EM algorithm Kmeans clustering 
[B] Sec 9.1 
Lecture 40  Connection between Kmeans and EM Introduction to PCA 
[B] Sec 9.3 
Lecture 41  PCA: Minimum error formulation  [B] Sec 12.1 
Lecture 42  PCA: Maximum variance formulation SVD 
[B] Sec 12.1 
Lecture 43  Decision trees  [B] Sec 14.4 
Lecture 44  Decision trees  [B] Sec 14.4 
Lecture 45  PAClearning  [KV] Chapter 1 
Lecture 46  Empirical risk minimization: Consistency  [Sa] Lec 20 
Lecture 47  Empirical risk minimization: Uniform convergence, and VCdimension  [Sa] Lec 21 
References:
[GS] Geoffrey Grimmett and David Stirzaker, Probability and random processes
[St] Gilbert Strang, Linear algebra and applications
[DHS] Richard O. Duda, Peter E. Hart and David G. Stork, Pattern Classification
[B] Christopher M. Bishop, Pattern Recognition and Machine Learning
[BT] Dimitri P. Bertsekas and John N. Tsitsiklis, Neurodynamic programming
[Si] Simon Haykin. Neural networks and learning machines
[SS] Bernhard Scholkopf, Alexander J. Smola, Learning with Kernels
[KV] Michael Kearns and Umesh Vazirani, An Introduction to Computational Learning Theory
[Sa] P S Sastry, Lectures on pattern recognition