Matus Telgarsky.

Hello, Friend.

My name is Matus Telgarsky;
I study mathematical aspects of deep learning.

Selected work (see also arXiv and google scholar).

  1. Directional convergence and alignment in deep learning.

    • Ziwei Ji, Matus Telgarsky; NeurIPS 2020.

    • This work proves that the parameters of standard deep networks converge in direction and are asymptotic critical points of the margin objective, generalizing and strengthening many old implicit bias results, and implying a variety of new ones. See also our older work on alignment in deep linear networks.

  2. Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks.

    • Ziwei Ji, Matus Telgarsky; ICLR 2020.

    • This work shows that networks with empirically-motivated small width can still achieve arbitrarily small test error near initialization (indeed sometimes with matching test error lower bounds), assuming these small networks exist (stated formally as a margin assumption). This was the first work to use a width smaller than the number of training points.

  3. Risk and parameter convergence of logistic regression.

    • Ziwei Ji, Matus Telgarsky; COLT 2019.

    • This work shows that gradient descent with linear predictors and exponentially-tailed losses selects minimum-norm predictors; for linearly-separable data, this translates into margin maximization, but this work moreover gives a nonseparable analysis. See also the independent work of Soudry, Hoffer, Nacson, Gunasekar, Srebro for another proof technique in the linearly separable case.

  4. Spectrally-normalized margin bounds for deep networks.

    • Peter Bartlett, Dylan Foster, Matus Telgarsky; NIPS 2018.

    • This work firstly gives a generalization bound which scales with the spectrally-normalized margin of deep networks, and secondly demonstrates empirically that this bound correlates with generalization, and that gradient descent prefers large margin predictors.

  5. Benefits of depth in deep networks.

  6. Margins, shrinkage, and boosting.

    • Matus Telgarsky; ICML 2013.

    • This work gave the first proof that a first order method (coordinate descent) on exponentially-tailed losses has an implicit bias towards the maximum margin direction.

I am also actively writing lecture notes on deep learning theory.
My other interests include clustering, unsupervised learning, interpretability, and reinforcement learning.


  1. Deep learning theory (CS 540 CS 598 DLT): fall 2022, fall 2021, fall 2020, fall 2019.
    Deep learning theory lecture notes.

  2. Machine learning (CS 446): spring 2022, spring 2021, spring 2019, spring 2018.
    Some course materials.

  3. Machine learning theory (CS 598 TEL): fall 2018, fall 2017, fall 2016.