My name is Matus Telgarsky;
I study mathematical aspects of deep learning.
Directional convergence and alignment in deep learning.
Ziwei Ji, Matus Telgarsky; NeurIPS 2020.
This work proves that the parameters of standard deep networks converge in direction and are asymptotic critical points of the margin objective, generalizing and strengthening many old implicit bias results, and implying a variety of new ones. See also our older work on alignment in deep linear networks.
Ziwei Ji, Matus Telgarsky; ICLR 2020.
This work shows that networks with empirically-motivated small width can still achieve arbitrarily small test error near initialization (indeed sometimes with matching test error lower bounds), assuming these small networks exist (stated formally as a margin assumption). This was the first work to use a width smaller than the number of training points.
Risk and parameter convergence of logistic regression.
Ziwei Ji, Matus Telgarsky; COLT 2019.
This work shows that gradient descent with linear predictors and exponentially-tailed losses selects minimum-norm predictors; for linearly-separable data, this translates into margin maximization, but this work moreover gives a nonseparable analysis. See also the independent work of Soudry, Hoffer, Nacson, Gunasekar, Srebro for another proof technique in the linearly separable case.
Spectrally-normalized margin bounds for deep networks.
Peter Bartlett, Dylan Foster, Matus Telgarsky; NIPS 2018.
This work firstly gives a generalization bound which scales with the spectrally-normalized margin of deep networks, and secondly demonstrates empirically that this bound correlates with generalization, and that gradient descent prefers large margin predictors.
Benefits of depth in deep networks.
Matus Telgarsky; COLT 2016.
This work gave the first proof that there exist deep networks which are hard to approximate by shallow networks, even if those shallow networks have exponentially as many nodes as the deep network. See also my earlier, simpler, more specific construction, and this cleaner proof in my deep learning theory lecture notes.
Margins, shrinkage, and boosting.
Matus Telgarsky; ICML 2013.
This work gave the first proof that a first order method (coordinate descent) on exponentially-tailed losses has an implicit bias towards the maximum margin direction.
I am also actively writing lecture notes on deep learning theory (new version, old version). My other interests include clustering, unsupervised learning, interpretability, and reinforcement learning.
Deep learning theory (CS 540 CS 598 DLT): fall 2022, fall
2021, fall 2020, fall 2019.
Deep learning theory lecture notes: new version, old
version.
Machine learning (CS 446): spring 2022, spring 2021, spring 2019,
spring 2018.
Some course materials.
Machine learning theory (CS 598 TEL): fall 2018, fall 2017, fall 2016.
I was very fortunate to receive my PhD from UCSD in 2013 under glorious Sanjoy Dasgupta.
I have two glorious PhD students:
Ziwei Ji (吉梓玮) (here is a glorious 2020 talk he gave on some of our implicit bias work).
Justin D. Li. (Just getting started, glorious talks forthcoming.)
My research is funded by an NSF CAREER award, and an NVIDIA GPU grant.
During summer 2019 I co-organized a Simons Institute summer program on deep learning with Samy Bengio, Aleksander Mądry, Elchanan Mossel; I was also at the Simons Institute during Spring 2017.
I co-founded the Midwest ML Symposium (MMLS) and moreover co-chaired the 2017 and 2018 editions, all together with glorious Po-Ling Loh.
I have a degree in violin performance from Juilliard, but hardly play any more.
I coded a screensaver, a 3-d plotting tool, and a few other things if you know where to look.
I like scifi books, pencils, ramen, and aphex twin.