Time & Place. MW 3:30-4:45, Siebel 1214.
Instructor. Matus Telgarsky (Office hours: Siebel 3212, W 4:45-6:00).
TA. Yucheng Chen (Office hours: Siebel 2107, M 5-6, when homework due M 5-7).
Evaluation. Homework is 80% of your grade, project is 20% of your grade. All handed-in work must be \(\LaTeX\)-compiled, on time, and submitted through gradescope (self-enrollment code MZZK25). Homework details and project details appear below.
Academic integrity. Cheating in this class wastes everyone’s time, just take something else. Please see the full information below.
Discussion and announcements. piazza, here’s the signup link.
Feedback. Please reach out with any comments/concerns!
Notes are typed when possible, otherwise handwritten notes are scanned.
Schedule for future lectures is imprecise.
Date. | Topics. | Notes. | Coursework. |
8/27 | Overview. | pdf. | hw0 out: tex, pdf. |
Representation/approximation. | |||
8/29 | Overview; failure of linear. | pdf. | |
9/3 | No class: labor day! | hw0 due! | |
9/5 | Box apx (linear over boxes; decision trees). | pdf. | |
9/10 | Box apx (3 layer networks). | pdf. | |
9/12 | Poly apx (RBF SVM, 2-layer networks). | pdf. | |
9/17 | Succinct 1 (start of depth separation). | pdf. | |
9/19 | Succinct 2 (end of depth separation). | pdf. | hw1 out: tex, pdf. hw1v2: tex, pdf, diff. hw1v3: tex, pdf, diff. |
9/24 | Succinct 3 (networks for squaring and distributions). | pdf. | |
Optimization & online learning. | |||
9/26 | Online learning 1: Perceptron. | pdf. | |
10/1 | Online learning 2: concept learning. | pdf. | |
10/3 | Batch optimization 1: smooth. | pdf. | |
10/8 | Batch optimization 2: strongly convex; lipschitz. | pdf. | hw1 due. |
10/10 | Maurey sparsification and Frank-Wolfe. | pdf. | |
10/15 | Convex risk minimization and classification. | pdf. | |
10/17 | No class. “Fun reading”: my old convexity notes: | pdf, pdf. | |
10/22 | Approximate gradients. | pdf. | |
10/24 | Optimization summary and open problems. | pdf. | |
Generalization. | |||
10/29 | Concentration of measure. | pdf. | project proposal: tex, bib, pdf. |
10/31 | Finite classes and primitive covers. | pdf. | |
11/5 | Symmetrization and Rademacher complexity. | pdf. | hw2 out: tex, pdf. hw2v2: tex, pdf, diff. |
11/7 | No class! | ||
11/12 | Properties of Rademacher complexity. | pdf. | |
11/14 | Classification bounds. | pdf. | proposal due; proposal meetings. |
11/26 | VC dimension or linear functions and linear threshold networks. | pdf. | |
11/28 | VC dimension of ReLU networks. | pdf. | hw2 due. |
12/3 | Covering numbers and Rademacher complexity. | pdf. | hw3 out: tex, pdf. |
12/5 | Covering and Rademacher bounds for neural networks. | pdf. | |
Miscellaneous | |||
12/10 | Nearest neighbor. | pdf. | |
12/12 | Fast rates. | pdf. | |
Final presentations and homework. | |||
12/13 | Reading day: project presentations 1-4pm! | ||
Groups. You may work individually, or in pairs.
Content. The project must contain a theoretical component; whether you include something else as well is up to you, but will not fundamentally affect the grade. Also, please focus on quality; if you can make something cleaner and shorter without removing information, that is preferred.
Milestones. We’ll schedule meetings some time in early November so that I can sanity check all projects.
Other learning theory-ish classes. All of these courses are different, and all have good material, and there are many I neglected to include!
Lieven Vandenberghe @ UCLA. This is not a learning theory course, it’s part 3 of a long optimization course, covering material not in the standard Boyd-Vandenberghe book. The lectures links are to slides; the proofs there are incredibly clean, indeed this is my favorite resouce for many of these methods.
Textbooks and surveys. Again, there are many others, but here are a key few.
Boucheron, Bousquet, and Lugosi have two excellent surveys: one is a gentler start, whereas the other even gets to some advanced topics. I read the easier one immediately upon entering grad school (despite having taken a good learning theory course in undergrad) and really liked it.
Shai Shalev-Shwartz and Shai Ben-David have an excellent textbook from 2014: “Understanding Machine Learning: From Theory to Algorithms”.
Shai Shalev-Shwartz also has an excellent monograph covering online learning.
Devroye, Györfi, and Lugosi have a wonderful book from 1996, “A Probabilistic Theory of Pattern Recognition”, which in addition to covering standard topics in statistical learning theory, has many nice results which rarely enter courses, for instance Stone’s original argument for the consistency of \(k\)-nn in finite-dimensional Euclidean space.
Martin Anthony and Peter Bartlett have a wonderful 1999 book, “Neural Network Learning: Theoretical Foundations”, which is again in the setting of statistical learning theory, and is the only listed reference with extensive VC dimension
Csaba Szepesvari and Tor Lattimore have an excellent bandit book based on a course they taught.