Virtual classroom.
Course staff.
Instructor: Matus Telgarsky.
TA: Ziwei Ji.
When contacting course staff, please always write to both of us.
Evaluation.
Your course grade is 3% hw0, 25% hw1, 25% hw2, 25% hw3, 22% paper summary.
Homework 0 (3% of grade) is multiple choice, and can be found at gradescope.
Homeworks 13 (75% of grade):
Please write clean, succinct, unambiguous solutions.
Absolutely no late homework. You will have at least 2 weeks per homework; plan accordingly.
Homework must be typed, though you are free to use whatever typesetting software you wish (latex, markdown, google doc, ms word, …).
You are free to work in groups of at most 2. If you work alone, you get roughly 10% boost for each homework (3% course grade each). Homeworks 2 and 3 require a handin for every student; no more groups of 2.
Excluding office hours, please discuss with at most two people outside your group three other people, and list their names on your submissions.
Paper summary (22% of grade):
Once again, you may work in groups of 2, but in contrast to homework there is no extra credit for groups of size 1.
This will have two phases:
Phase 1 (due November 4, 11:59pm): you give me a list of 5 deep learning theory papers (must be primarily lemmas, theorems, and proofs) that interest you, along with a 12 sentence summary of each. The papers must have a first appearance of no earlier than July 1, 2017. Further information can be found here.
Phase 2 (due December 11, 11:59pm): I pick one of those 5, and you write a 2 page summary, with an additional page (or pages) for references. Further information can be found here.
More information will appear here later, including examples and templates.
Academic integrity.
All submitted homework must be in your own words; communication between groups should be sufficiently highlevel to guarantee that solutions do not look similar. This also means that effectively working in a 2person group and then splitting for extra credit is not viable.
We prefer you do not dig around for homework solutions; if you do rely upon external resources, cite them, and still write your solutions in your own words. Please see Jeff Erickson’s discussion of academic integrity.
When integrity violations are found, they will be submitted to the department’s evaluation board.
Readings and prerequisites.
Basic math and proof writing; hw0 will help you determine if you are ready, and hw1, out on 9/8, will make the situation clear.
Basic machine learning, for instance the material in my CS 446 course.
"Understanding Machine Learning, by Shai ShalevShwartz and Shai BenDavid can be downloaded from that page, is free for personal use. I think this book is a wonderful resource, I find its presentation very clear, direct, and minimal.
Schedule will be continuously updated; check back often!
Course lecture notes (also continuously updated). (I need a bit more time to reenable PDF notes, please use your browser’s “print to PDF” feature for now.)
Date  Topic  Assignments 

8/25  Course introduction. Notes section 1. 
hw0 out (on gradescope). 
8/27  Approximation overview; univariate case. Notes sections 13, tablet notes. 

9/1  Start of multivariate approximation. Notes sections 34, tablet notes. 
hw0 due. 
9/3  Classical multivariate approximation. Notes sections 45, tablet notes. 

9/8  Fourierbased multivariate apx (Barron’s Theoem). Notes sections 56, tablet notes. 

9/10  Sampling infinitewidth networks via Maurey’s lemma. Notes sections 67, tablet notes. 

9/15  Benefits of depth, part 1: proof sketch and linear region upper bound. Notes section 8, tablet notes. 
hw1 handout, hw1 template. 
9/17  Benefits of depth, part 2: full depth separation proof. Notes sections 89, tablet notes. 

9/22  NTK and minimum norm functions. Notes section 10, tablet notes. 

9/24  NTK and minimum norm functions. Notes section 10, tablet notes. 

9/29  Concluding remarks on NTK and apx. Notes section 10, tablet notes. 

10/1  Optimization: overview; smoothness and gradient descent. Notes sections 1112, tablet notes. 

10/6  Smoothness and convexity for gradient flow and gradient descent. Notes section 12, tablet notes. 

10/8  Strong convexity. Notes section 13, tablet notes. 

10/13  Finishing strong convexity; starting stochastic gradients. Notes section 14, tablet notes. 
hw1 due (10/14). 
10/15  Stochastic gradients; start of NTK opt. Notes section 14, tablet notes. 

10/20  Shallow NTK GD analysis with smooth activations. Notes section 15, tablet notes. 
project phase 1 out. 
10/22  Shallow NTK GD analysis with smooth activations. Notes section 15, tablet notes. 

10/27  Nonsmoothness. Notes section 16, tablet notes. 

10/29  Nonsmoothness; start of implicit bias. Notes section 1617, tablet notes. 

11/3  (No class: “allcampus holiday”!)  hw2 handout, hw2 template. project phase 1 due (11/4). 
11/5  Implicit bias and margin maximization. Notes section 17, tablet notes. 

11/10  Implicit bias and margin maximization. Notes section 17, tablet notes. 
project phase 2 out. 
11/12  Implicit bias and margin maximization (continued). Notes section 17, tablet notes. 

11/17  Implicit bias and margin maximization (final part). Notes section 1718, tablet notes. 

11/19  Start of generalization: concentration. Notes sections 1920, tablet notes. 
hw2 due (11/20). 
12/1  Rademacher complexity basics. Notes section 21, tablet notes. 

12/3  Rademacher complexity continued; logistic regression. Notes section 21, tablet notes. 
hw3 handout, hw3 template. 
12/8  Rademacher bounds for deep networks. Notes sections 2225, tablet notes. 

12/11  (No class.)  project phase 2 due. 
12/16  (No class.) Office hours in zoom, 5pm. 

12/18  (No class.)  hw3 due. 