# Seminars and Events

# Machine Learning Seminar Series

## Spring 2020: Fridays at 10:30

Friday, Apr. 3, 2020: No talk

Friday, Apr. 10, 2020: No talk

Friday, Apr. 17, 2020: Pritish Kamath, TTIC

Friday, Apr. 24, 2020: Dimitris Papailiopoulos, UW-Madison

Friday, May 1, 2020: Shirley Wu, UT-Austin

Friday, May 8, 2020: Vaishaal Shankar, UC-Berkeley

Friday, May 15, 2020: No talk

Friday, May 22, 2020: Cyrus Rashtchian, UCSD

Friday, May 29, 2020: Laurent Lessard, UW-Madison

Friday, Jun. 5, 2020: Yuanzhi Li, CMU

## Winter 2020: Fridays at 10:30

Friday, Jan. 10, 2020 in TTIC 526: Mina Karzand, UW-Madison

### Focused Learning in Tree-Structured Graphical Models

Abstract: This talk is about the predictive power of learned graphical models where we show that an incorrect or incomplete combinatorial structure (graph) can nevertheless yield accurate predictions.

In particular, in the first half of the talk, I look into learning tree structured Ising models in which the learned model is used subsequently for prediction based on partial observations (given the realization of a subset of variables, predict the value of the remaining ones). The vast majority of previous work on learning graphical models aims to correctly recover the underlying graph structure. In the data-constrained regime, learning the entire graph structure correctly is usually impossible. I show that it is possible to efficiently learn a tree model that gives accurate predictions even when there is insufficient data to learn the correct structure.

The second half of the talk is about speciation rate estimation in phylogenetic trees. This problem is essentially one of inferring features of the model (in this case, the speciation or extinction rate) from partial observations (thesequences at the leaves of the tree) of a latent tree model (phylogeny). I show that to estimate the speciation rate efficiently, it is not necessary to follow the popular approach of reconstructing the complete tree structure as an intermediate step, which requires long DNA sequences. Instead, one can extract precisely the right type of information about the rates by zooming into carefully chosen local structures. My results show that an incomplete and partially incorrect summary of the tree structure is enough to estimate the speciation rate with the minimax optimal dependence on the length of observed DNA sequences.

Joint work with Guy Bresler, Sebastien Roch and Robert Nowak.

Bio: Mina Karzand is a postdoctoral associate in University of Wisconsin-Madison. Before that, she was a postdoctoral research associate in MIT where she received her PhD in Electrical Engineering and Computer Science. Her research interests are on the design and analysis of data driven inference and decision making systems in the intersection of areas of machine learning, probability, and information theory.

Friday, Jan. 17, 2020 in Crerar 390: Rina Foygel Barber, University of Chicago

### Predictive inference with the jackknife+

Abstract: We introduce the jackknife+, a novel method for constructing predictive confidence intervals that is robust to the distribution of the data. The jackknife+ modifies the well-known jackknife (leave-one-out cross-validation) to account for the variability in the fitted regression function when we subsample the training data. Assuming exchangeable training samples, we prove that the jackknife+ permits rigorous coverage guarantees regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically. Such guarantees are not possible for the original jackknife and we demonstrate examples where the coverage rate may actually vanish. Our theoretical and empirical analysis reveals that the jackknife and jackknife+ intervals achieve nearly exact coverage and have similar lengths whenever the fitting algorithm obeys some form of stability. We also extend to the setting of K-fold cross-validation. Our methods are related to cross-conformal prediction proposed by Vovk [2015] and we discuss connections. This work is joint with Emmanuel Candes, Aaditya Ramdas, and Ryan Tibshirani.

Bio: Rina Foygel Barber is an Associate Professor in the Department of Statistics at the University of Chicago. Before starting at U of C, she was a NSF postdoctoral fellow during 2012-13 in the Department of Statistics at Stanford University, supervised by Emmanuel Candès. She received her PhD in Statistics at the University of Chicago in 2012, advised by Mathias Drton and Nati Srebro, and a MS in Mathematics at the University of Chicago in 2009. Prior to graduate school, she was a mathematics teacher at the Park School of Baltimore from 2005 to 2007.

Friday, Jan. 24, 2020 in TTIC 526: Steve Hanneke, TTIC

### Learning Whenever Learning is Possible: Universal Learning under General Stochastic Processes

Abstract: I will present a general theory of learning and generalization without the i.i.d. assumption, starting from first principles. We focus on the problem of universal consistency: the ability to estimate any function from X to Y. We endeavor to answer the question: Does there exist a learning algorithm that is universally consistent under every process on X for which universal consistency is possible? Remarkably, we find the answer is “Yes”. Thus, we replace the i.i.d. assumption with only the most natural (and necessary!) assumption: that learning is at least possible. Along the way, we also find a concise characterization of the family of all processes that admit universally consistent learners. One challenge for learning is that some of these processes do not even have a law of large numbers.

Bio: Steve Hanneke is a Research Assistant Professor at the Toyota Technological Institute at Chicago. His research explores the theory of machine learning, with a focus on reducing the number of training examples sufficient for learning. His work develops new approaches to supervised, semi-supervised, transfer, and active learning, and also revisits the basic probabilistic assumptions at the foundation of learning theory. Steve earned a Bachelor of Science degree in Computer Science from UIUC in 2005 and a Ph.D. in Machine Learning from Carnegie Mellon University in 2009 with a dissertation on the theoretical foundations of active learning.

Friday, Jan. 31, 2020 in Crerar 390: Tengyu Ma, Stanford

### Designing Explicit Regularizers for Deep Models

Abstract: I will discuss some recent results on designing explicit regularizers to improve the generalization performances of deep neural networks. We derive data-dependent generalization bounds for deep neural networks. We empirically regularize the bounds and obtain improved generalization performance (in terms of the standard accuracy or the robust accuracy). I will also touch on recent results on applying these techniques to imbalanced datasets.

Based on joint work with Colin Wei, Kaidi Cao, Adrien Gaidon, and Nikos Arechiga

Bio: Tengyu Ma is an assistant professor of Computer Science and Statistics at Stanford University. He received his Ph.D. from Princeton University and B.E. from Tsinghua University. His research interests include topics in machine learning and algorithms, such as deep learning and its theory, non-convex optimization, deep reinforcement learning, representation learning, and high-dimensional statistics. He is a recipient of NIPS’16 best student paper award, COLT’18 best paper award, and ACM Doctoral Dissertation Award Honorable Mention.

Friday, Feb. 7, 2020 in TTIC 526: Yixin Wang, Columbia

### The Blessings of Multiple Causes

Abstract: Causal inference from observational data is a vital problem, but it comes with strong assumptions. Most methods assume that we observe all confounders, variables that affect both the causal variables and the outcome variables. But whether we have observed all confounders is a famously untestable assumption. We describe the deconfounder, a way to do causal inference from observational data allowing for unobserved confounding.

How does the deconfounder work? The deconfounder is designed for problems of multiple causal inferences: scientific studies that involve many causes whose effects are simultaneously of interest. The deconfounder uses the correlation among causes as evidence for unobserved confounders, combining unsupervised machine learning and predictive model checking to perform causal inference. We study the theoretical requirements for the deconfounder to provide unbiased causal estimates, along with its limitations and tradeoffs. We demonstrate the deconfounder on real-world data and simulation studies.

Friday, Feb. 14, 2020 in Crerar 390: Bryon Aragam, University of Chicago, Booth

### A general framework for learning DAGs with NO TEARS

Abstract: Interpretability and causality have been acknowledged as key ingredients to the success and evolution of modern machine learning systems. Graphical models, and more specifically directed acyclic graphs (DAGs, also known as Bayesian networks), are an established tool for learning and representing interpretable causal models. Unfortunately, estimating the structure of DAGs from data is a notoriously difficult problem, and as a result existing approaches rely on various local heuristics for enforcing the acyclicity constraint. In this talk, we introduce a fundamentally different strategy: We formulate the structure learning problem as a purely continuous optimization problem that avoids this combinatorial constraint entirely. This optimization problem can be efficiently solved by standard numerical algorithms, avoiding handcrafted algorithms which also makes implementation particularly easy. As a result, we obtain a general framework for learning parametric, nonparametric, and dynamic DAG models that includes GLMs, additive noise models, and index models as special cases.

Joint work with Xun Zheng, Chen Dan, Pradeep Ravikumar, and Eric P. Xing.

Bio: Bryon Aragam is an Assistant Professor and Topel Faculty Scholar in the Booth School of Business at the University of Chicago. His research interests include statistical machine learning, nonparametric statistics, and optimization. He is also involved with developing open-source software and solving problems in interpretability, ethics, and fairness in artificial intelligence.

Friday, Feb. 21, 2020 in TTIC 526: TTIC Student Workshop

Friday, Feb. 28, 2020 in TTIC 526: Xiang Cheng, Berkeley

### Sampling as Optimization and Optimization as Sampling

Abstract: This talk presents a series of results that draw connections between optimization and sampling. In one such result, we show that the Langevin SDE corresponds to the gradient flow of KL divergence with respect to the 2-Wasserstein metric in probability space. This allows us to prove convergence of Langevin MCMC in KL divergence, and even achieve accelerated rates in a similar fashion to Nesterov’s accelerated gradient descent. In the reverse direction, we can also show that Stochastic Gradient Descent may be viewed as the discretization of a certain Stochastic Differential Equation with a state-dependent diffusion matrix that corresponds to the covariance matrix of the sampled stochastic gradient. This theory helps us explain the behavior of SGD in settings such as the training of deep neural networks, where it has been observed that larger noise (in the form of smaller batch-size/larger step-size) gives smaller generalization error.

Based on joint work with Peter Bartlett, Niladri Chatterji, Michael Jordan, Yian Ma, Dong Yin

Friday, Mar. 6, 2020: No talk

Friday, Mar. 13, 2020 in Crerar 390: Samory Kpotufe, Columbia

### Some Recent Insights on Transfer Learning

Abstract : A common situation in Machine Learning is one where training data is not fully representative of a target population due to bias in the sampling mechanism or high costs in sampling the target population; in such situations, we aim to ’transfer’ relevant information from the training data (a.k.a. source data) to the target application. How much information is in the source data? How much target data should we collect if any? These are all practical questions that depend crucially on ‘how far’ the source domain is from the target. However, how to properly measure ‘distance’ between source and target domains remains largely unclear.

In this talk we will argue that much of the traditional notions of ‘distance’ (e.g. KL-divergence, extensions of TV such as D_A discrepancy, density-ratios, Wasserstein distance) can yield an over-pessimistic picture of transferability. Instead, we show that some new notions of ‘relative dimension’ between source and target (which we simply term ‘transfer-exponents’) capture a continuum from easy to hard transfer. Transfer-exponents uncover a rich set of situations where transfer is possible even at fast rates, helps answer questions such as the benefit of unlabeled or labeled target data, yields a sense of optimal vs suboptimal transfer heuristics, and have interesting implications for related problems such as multi-task learning.

Friday, Mar. 20, 2020 in TTIC 526: Filip Hanzely, KAUST

## Fall 2019: Fridays at 10:30

**SPECIAL TIME:** Wednesday, Oct. 2, 2019 at 1:30pm in TTIC room 526: Arthur Szlam, Facebook Research

### Language and Interaction in Minecraft

Friday, Oct. 11, 2019 in TTIC room 526: Peter Bartlett, UC-Berkeley

### Optimizing probability distributions for machine learning: sampling meets optimization

Abstract: Optimization and sampling are both of central importance in large-scale machine learning problems, but they are typically viewed as very different problems. This talk presents recent results that exploit the interplay between them. Viewing Markov chain Monte Carlo sampling algorithms as performing an optimization over the space of probability distributions, we demonstrate analogs of Nesterov’s acceleration approach in the sampling domain, in the form of a discretization of an underdamped Langevin diffusion. In the other direction, we view stochastic gradient optimization methods, such as those that are common in deep learning, as sampling algorithms, and study the finite-time convergence of their iterates to a stationary distribution.

Joint work with Yasin Abbasi-Yadkori, Niladri Chatterji, Xiang Cheng, Michael Jordan, Yi-An Ma, Wenlong Mou, and Martin Wainwright.

Bio: Peter Bartlett is a professor in the Computer Science Division and Department of Statistics and Associate Director of the Simons Institute for the Theory of Computing at the University of California at Berkeley. His research interests include machine learning and statistical learning theory. He is the co-author, with Martin Anthony, of the book Neural Network Learning: Theoretical Foundations. He has served as an associate editor of the journals Bernoulli, Mathematics of Operations Research, the Journal of Artificial Intelligence Research, the Journal of Machine Learning Research, and the IEEE Transactions on Information Theory, and as program committee co-chair for COLT and NeurIPS. He was awarded the Malcolm McIntosh Prize for Physical Scientist of the Year in Australia in 2001, and was chosen as an Institute of Mathematical Statistics Medallion Lecturer in 2008, an IMS Fellow and Australian Laureate Fellow in 2011, and a Fellow of the ACM in 2018. He was elected to the Australian Academy of Science in 2015.

Friday, Oct. 18, 2019 in Crerar room 390: Madeleine Udell, Cornell

### Big data is low rank

Friday, Oct. 25, 2019 in TTIC room 526: Shay Moran, Technion

### Convex Set Disjointness, Distributed Learning of Halfspaces, and LP Feasibility

Abstract: We study the *Convex Set Disjointness* (CSD) problem, where two players have input sets taken from an arbitrary fixed domain U\subset R^d of size | U | = n. Their mutual goal is to decide using minimum communication whether the convex hulls of their sets intersect (equivalently, whether their sets can be separated by a hyperplane).

Friday, Nov. 1, 2019 in Crerar room 390: Ohad Shamir, Weizmann

### Training Neural Networks: The Bigger the Better?

Abstract:

**Special Talk:** Distinguished Lecture at TTIC, Friday, Nov. 8, 2019 in TTIC room 526: Shafi Goldwasser, MIT

Friday, Nov. 15, 2019 in Crerar 390: Greg Ongie, University of Chicago; and Blake Woodworth, TTIC

### Greg Ongie: A function space view of overparameterized neural networks

### Blake Woodworth: The complexity of finding stationary points in convex and non-convex optimization

Friday, Nov. 22, 2019 in TTIC 526: Ramya Vinayak, University of Washington

### Learning from Sparse Data

Abstract:

In this talk, I will present our recent results where we show that the maximum likelihood estimator (MLE) is minimax optimal in the sparse observation regime. While the MLE for this problem was proposed as early as the late 1960’s, how accurately the MLE recovers the true distribution was not known. Our work closes this gap. In the course of our analysis, we provide novel bounds on the coefficients of Bernstein polynomials approximating Lipschitz-1 functions. Furthermore, the MLE is also efficiently computable in this setting and we evaluate the performance of MLE on both synthetic and real datasets.

Joint work with Weihao Kong, Gregory Valiant, and Sham Kakade.

Bio: Ramya Korlakai Vinayak is a postdoctoral researcher at the Paul G. Allen School of Computer Science and Engineering at the University of Washington, working with Sham Kakade. Her research interests broadly span the areas of machine learning, statistical inference, and crowdsourcing. She received a Ph.D. from Caltech where she was advised by Babak Hassibi. She is a recipient of the Schlumberger Foundation Faculty of the Future fellowship from 2013- 15. She obtained her Masters from Caltech and Bachelors from IIT Madras.

Friday, Nov. 29, 2019: Thanksgiving break

Friday, Dec. 6, 2019 in TTIC 526: Zhaoran Wang, Northwestern

### On the Computational and Statistical Efficiency of Policy Optimization in (Deep) Reinforcement Learning

Abstract: Coupled with powerful function approximators such as neural networks, policy optimization plays a key role in the tremendous empirical successes of deep reinforcement learning. In sharp contrast, the theoretical understandings of policy optimization remain rather limited from both the computational and statistical perspectives. From the perspective of computational efficiency, it remains unclear whether policy optimization converges to the globally optimal policy in a finite number of iterations, even given infinite data. From the perspective of statistical efficiency, it remains unclear how to attain the globally optimal policy with a finite regret or sample complexity.

Friday, Dec. 13, 2019: NeurIPS

## Academic year 2018-2019

#### Wednesdays, 1-2pm in the Harper Center (Booth) Room 219 Pizza provided by UChicago CS Department

###### Sign up for announcement email list at https://lists.uchicago.edu/web/subscribe/ml-announce.

April 3, 2019 **in Crerar room 390**: Lorenzo Orecchia, Boston University

### First-Order Methods Unleashed: Scalable Optimization in the Age of Big Data

April 10, 2019 **in room C25**: Mesrob Ohannessian, TTIC

### From Fair Decisions to Social Benefit

April 17, 2019 **in room C04**: Lin Yang, Princeton University

### Learn Policy Optimally via Efficiently Utilizing Data

April 24, 2019: Avrim Blum, TTIC

### Algorithmic fairness in online decision-making

April 29, 2019: (2:30pm in Crerar 390) Tong Zhang, Hong Kong University of Science and Technology.

### Modern Techniques of Statistical Optimization for Machine Learning

May 1, 2019: Tengyuan Liang, University of Chicago Booth

### New Thoughts on Adaptivity, Generalization and Interpolation Motivated from Neural Networks

May 8, 2019: Rosemary Braun, Northwestern University

### Using Gene Expression to Tell Time

May 15, 2019: Yali Amit, UChicago Statistics

### Optimization of latent variables in deep network applications

May 22, 2019: UChicago Geometric Data Analysis Workshop

May 29, 2019 in **SHFE 203 (Saieh Hall For Economics)**: Dan McDonald

### Trend filtering in exponential families

## Past Quarters:

Jan. 16, 2019: Sumeet Katariya, UW-Madison

### Adaptive Sampling for Ranking and Clustering

Jan. 23, 2019: Karl Stratos, TTIC

### Challenges and Hopes of Unsupervised Learning by Mutual Information Maximization

Feb. 6, 2019: Chao Gao, University of Chicago Statistics

### Convergence Rates of Variational Posterior Distributions

Feb. 13, 2019: Veronika Rockova, Univeristy of Chicago, Booth

### On Theory for BART

Feb. 20, 2019: Ruey Tsay, University of Chicago, Booth

### Analysis of Big Dependent Data

Feb. 27, 2019: No Seminar

March 5, 2019 **(TUESDAY), Harper Center 3B from 12:10 to 1:10 p.m.:** Jorge Nocedal, Northwestern University

### Zero-Order Methods for the Optimization of Noisy Functions

March 6, 2019: Bryan Pardo, Northwestern University

### Audio source separation models that learn without ground truth and are open to user correction

March 13, 2019: Sebastian Stitch, EPFL

### Error Feedback for Communication Efficient SGD

March 20, 2019: Spring Break, No Seminar

March 27, 2019: Spring Break, No Seminar

## Additonal Machine Learning Events