Seminars and Events

Towards Human-Centered Explanations of AI Predictions

Approximate Bayesian Computation via Classification

Optimal Algorithms for Continuous Non-Monotone Submodular Maximization : Theory and Applications

Estimation Differential Networks from Functional Data

Transductive Robust Learning Guarantees

Learning Ising Models with Latent Variables

Policy Learning with Adaptively Collected Data

Abstract: This talk is about the predictive power of learned graphical models where we show that an incorrect or incomplete combinatorial structure (graph) can nevertheless yield accurate predictions.

In particular, in the first half of the talk, I look into learning tree structured Ising models in which the learned model is used subsequently for prediction based on partial observations (given the realization of a subset of variables, predict the value of the remaining ones). The vast majority of previous work on learning graphical models aims to correctly recover the underlying graph structure. In the data-constrained regime, learning the entire graph structure correctly is usually impossible. I show that it is possible to efficiently learn a tree model that gives accurate predictions even when there is insufficient data to learn the correct structure.

The second half of the talk is about speciation rate estimation in phylogenetic trees. This problem is essentially one of inferring features of the model (in this case, the speciation or extinction rate) from partial observations (thesequences at the leaves of the tree) of a latent tree model (phylogeny). I show that to estimate the speciation rate efficiently, it is not necessary to follow the popular approach of reconstructing the complete tree structure as an intermediate step, which requires long DNA sequences. Instead, one can extract precisely the right type of information about the rates by zooming into carefully chosen local structures. My results show that an incomplete and partially incorrect summary of the tree structure is enough to estimate the speciation rate with the minimax optimal dependence on the length of observed DNA sequences.

Joint work with Guy Bresler, Sebastien Roch and Robert Nowak.

Bio: Mina Karzand is a postdoctoral associate in University of Wisconsin-Madison. Before that, she was a postdoctoral research associate in MIT where she received her PhD in Electrical Engineering and Computer Science. Her research interests are on the design and analysis of data driven inference and decision making systems in the intersection of areas of machine learning, probability, and information theory.

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Abstract: This talk is about the predictive power of learned graphical models where we show that an incorrect or incomplete combinatorial structure (graph) can nevertheless yield accurate predictions.

In particular, in the first half of the talk, I look into learning tree structured Ising models in which the learned model is used subsequently for prediction based on partial observations (given the realization of a subset of variables, predict the value of the remaining ones). The vast majority of previous work on learning graphical models aims to correctly recover the underlying graph structure. In the data-constrained regime, learning the entire graph structure correctly is usually impossible. I show that it is possible to efficiently learn a tree model that gives accurate predictions even when there is insufficient data to learn the correct structure.

The second half of the talk is about speciation rate estimation in phylogenetic trees. This problem is essentially one of inferring features of the model (in this case, the speciation or extinction rate) from partial observations (thesequences at the leaves of the tree) of a latent tree model (phylogeny). I show that to estimate the speciation rate efficiently, it is not necessary to follow the popular approach of reconstructing the complete tree structure as an intermediate step, which requires long DNA sequences. Instead, one can extract precisely the right type of information about the rates by zooming into carefully chosen local structures. My results show that an incomplete and partially incorrect summary of the tree structure is enough to estimate the speciation rate with the minimax optimal dependence on the length of observed DNA sequences.

Joint work with Guy Bresler, Sebastien Roch and Robert Nowak.

Bio: Mina Karzand is a postdoctoral associate in University of Wisconsin-Madison. Before that, she was a postdoctoral research associate in MIT where she received her PhD in Electrical Engineering and Computer Science. Her research interests are on the design and analysis of data driven inference and decision making systems in the intersection of areas of machine learning, probability, and information theory.

Focused Learning in Tree-Structured Graphical Models

Abstract: This talk is about the predictive power of learned graphical models where we show that an incorrect or incomplete combinatorial structure (graph) can nevertheless yield accurate predictions.

In particular, in the first half of the talk, I look into learning tree structured Ising models in which the learned model is used subsequently for prediction based on partial observations (given the realization of a subset of variables, predict the value of the remaining ones). The vast majority of previous work on learning graphical models aims to correctly recover the underlying graph structure. In the data-constrained regime, learning the entire graph structure correctly is usually impossible. I show that it is possible to efficiently learn a tree model that gives accurate predictions even when there is insufficient data to learn the correct structure.

The second half of the talk is about speciation rate estimation in phylogenetic trees. This problem is essentially one of inferring features of the model (in this case, the speciation or extinction rate) from partial observations (thesequences at the leaves of the tree) of a latent tree model (phylogeny). I show that to estimate the speciation rate efficiently, it is not necessary to follow the popular approach of reconstructing the complete tree structure as an intermediate step, which requires long DNA sequences. Instead, one can extract precisely the right type of information about the rates by zooming into carefully chosen local structures. My results show that an incomplete and partially incorrect summary of the tree structure is enough to estimate the speciation rate with the minimax optimal dependence on the length of observed DNA sequences.

Joint work with Guy Bresler, Sebastien Roch and Robert Nowak.

Bio: Mina Karzand is a postdoctoral associate in University of Wisconsin-Madison. Before that, she was a postdoctoral research associate in MIT where she received her PhD in Electrical Engineering and Computer Science. Her research interests are on the design and analysis of data driven inference and decision making systems in the intersection of areas of machine learning, probability, and information theory.

Predictive inference with the jackknife+

Learning Whenever Learning is Possible: Universal Learning under General Stochastic Processes

Designing Explicit Regularizers for Deep Models

The Blessings of Multiple Causes

A general framework for learning DAGs with NO TEARS

Abstract: Interpretability and causality have been acknowledged as key ingredients to the success and evolution of modern machine learning systems. Graphical models, and more specifically directed acyclic graphs (DAGs, also known as Bayesian networks), are an established tool for learning and representing interpretable causal models. Unfortunately, estimating the structure of DAGs from data is a notoriously difficult problem, and as a result existing approaches rely on various local heuristics for enforcing the acyclicity constraint. In this talk, we introduce a fundamentally different strategy: We formulate the structure learning problem as a purely continuous optimization problem that avoids this combinatorial constraint entirely. This optimization problem can be efficiently solved by standard numerical algorithms, avoiding handcrafted algorithms which also makes implementation particularly easy. As a result, we obtain a general framework for learning parametric, nonparametric, and dynamic DAG models that includes GLMs, additive noise models, and index models as special cases.

Joint work with Xun Zheng, Chen Dan, Pradeep Ravikumar, and Eric P. Xing.

Bio: Bryon Aragam is an Assistant Professor and Topel Faculty Scholar in the Booth School of Business at the University of Chicago. His research interests include statistical machine learning, nonparametric statistics, and optimization. He is also involved with developing open-source software and solving problems in interpretability, ethics, and fairness in artificial intelligence.

Prior to joining the University of Chicago, Bryon was a project scientist and postdoctoral researcher in the Machine Learning Department at Carnegie Mellon University. He completed his PhD in Statistics and a Masters in Applied Mathematics at UCLA. He has also served as a data science consultant for technology and marketing firms, where he has worked on problems in survey design and methodology, ranking, customer retention, and logistics.

Sampling as Optimization and Optimization as Sampling

Some Recent Insights on Transfer Learning

Language and Interaction in Minecraft

Optimizing probability distributions for machine learning: sampling meets optimization

Big data is low rank

Convex Set Disjointness, Distributed Learning of Halfspaces, and LP Feasibility

Training Neural Networks: The Bigger the Better?

Greg Ongie: A function space view of overparameterized neural networks

Blake Woodworth: The complexity of finding stationary points in convex and non-convex optimization

Learning from Sparse Data

Abstract:

In many scientific domains, the number of individuals in the population under study is often very large, however the number of observations available per individual is often very limited (sparse). Limited observations prohibit accurate estimation of parameters of interest for any given individual. In this sparse data regime, the key question is, how accurately can we estimate the distribution of parameters over the population? This problem arises in various domains such as epidemiology, psychology, health care, biology, and social sciences. As an example, suppose for a large random sample of the population we have observations of whether a person caught the flu for each year over the past 5 years. We cannot accurately estimate the probability of any given person catching the flu with only 5 observations, however our goal is to estimate the distribution of these probabilities over the whole population. Such an estimated distribution can be used in downstream tasks, like testing and estimating properties of the distribution.

In this talk, I will present our recent results where we show that the maximum likelihood estimator (MLE) is minimax optimal in the sparse observation regime. While the MLE for this problem was proposed as early as the late 1960’s, how accurately the MLE recovers the true distribution was not known. Our work closes this gap. In the course of our analysis, we provide novel bounds on the coefficients of Bernstein polynomials approximating Lipschitz-1 functions. Furthermore, the MLE is also efficiently computable in this setting and we evaluate the performance of MLE on both synthetic and real datasets.

Joint work with Weihao Kong, Gregory Valiant, and Sham Kakade.

Bio: Ramya Korlakai Vinayak is a postdoctoral researcher at the Paul G. Allen School of Computer Science and Engineering at the University of Washington, working with Sham Kakade. Her research interests broadly span the areas of machine learning, statistical inference, and crowdsourcing. She received a Ph.D. from Caltech where she was advised by Babak Hassibi. She is a recipient of the Schlumberger Foundation Faculty of the Future fellowship from 2013- 15. She obtained her Masters from Caltech and Bachelors from IIT Madras.

On the Computational and Statistical Efficiency of Policy Optimization in (Deep) Reinforcement Learning

First-Order Methods Unleashed: Scalable Optimization in the Age of Big Data

Abstract:First-order methods are a fundamental tool in the design of efficient algorithms for large-scale computational problems. Besides being the optimization workhorse of machine learning, first-order methods have recently served as a springboard for a number of algorithmic advances in discrete optimization, including submodular optimization and maximum flow problems. In this talk, I will showcase a number of results from my research that demonstrate the power of first-order methods as a generic framework for algorithm design.In the first part, I will describe my view of first-order methods as discretizations of continuous dynamical systems over curved spaces. For convex optimization, such dynamics conserve a specific quantity — the product of time and a notion of duality gap — which immediately guarantees convergence to optimum. This primal-dual view helps us to both design novel algorithms and simplify the analyses of existing ones. In particular, I will discuss how it yields a simple, intuitive analysis of accelerated algorithms and how it allows us to port such algorithms to contexts that do not squarely match standard smoothness assumptions.In the second part, we will see how to exploit problem-specific structure by preconditioning, i.e., by endowing the space with a curved geometry that facilitates the convergence of the dynamics above. In particular, I will describe how different random-walk-based algorithms for graph partitioning arise from different preconditionings of the same optimization problem, and how combinatorial preconditioners yield nearly-linear-time algorithms for flow problems over undirected graph.Bio:Lorenzo Orecchia is an assistant professor in the Department of Computer Science at Boston University. Lorenzo’s research focuses on the design of efficient algorithms for fundamental computational challenges in machine learning and combinatorial optimization. His approach is based on combining ideas from continuous and discrete optimization into a single framework for algorithm design. Lorenzo obtained his PhD in computer science at UC Berkeley under the supervision of Satish Rao in 2011, and was an applied mathematics instructor at MIT under the supervision of Jon Kelner until 2014. He was a recipient of the 2014 SODA Best Paper award and a co-organizer of the Simons semester “Bridging Continuous and Discrete Optimization” in Fall 2017.

From Fair Decisions to Social Benefit

Learn Policy Optimally via Efficiently Utilizing Data

Recent years have witnessed increasing empirical successes in reinforcement learning. Nevertheless, it is an irony that many theoretical problems in this field are not well understood even in the most basic setting. For instance, the optimal sample and time complexities of policy learning in finite-state Markov decision process still remain unclear. Given a state-transition sampler, we develop a novel algorithm that learns an approximate-optimal policy in near-optimal time and using a minimal number of samples. The algorithm makes updates by processing samples in a “streaming” fashion, which requires small memory and naturally adapts to large-scale data. Our result resolves the long-standing open problem on the sample complexity of Markov decision process and provides new insights on how to use data efficiently in learning and optimization.The algorithm and analysis can be extended to solve two-person stochastic games and feature-based Markov decision problems while achieving near-optimal sample complexity. We further illustrate several other examples of learning and optimization over streaming data, with applications in accelerating Astrophysical discoveries and improving network securities.Bio:Lin Yang is currently a postdoctoral researcher at Princeton University working with Prof. Mengdi Wang. He obtained two Ph.D. degrees simultaneously in Computer Science and in Physics & Astronomy from Johns Hopkins University in 2017. Prior to that, he obtained a bachelor’s degree from Tsinghua University. His research focuses on developing fast algorithms for large-scale optimization and machine learning. This includes reinforcement learning and streaming methods for optimization and function approximations. His algorithms have been applied to real-world applications including accelerating astrophysical discoveries and improving network security. He has published numerous papers in top Computer Science conferences including NeurIPS, ICML, STOC, and PODS. At Johns Hopkins, he was a recipient of the Dean Robert H. Roy Fellowship.

Algorithmic fairness in online decision-making

Modern Techniques of Statistical Optimization for Machine Learning

New Thoughts on Adaptivity, Generalization and Interpolation Motivated from Neural Networks

Using Gene Expression to Tell Time

Abstract:Determining the state of an individual’s internal physiologicalclock has important implications for precision medicine, fromdiagnosing neurological disorders to optimizing drug delivery. Tobe useful, such a test must be accurate, minimally burdensome tothe patient, and robust to differences in patient protocols, samplecollection, and assay technologies. In this talk I will presentTimeSignature, a novel machine-learning algorithm to estimatecircadian variables from gene expression in human blood. By makinguse of the high dimensionality of the gene expression measurementsand exploiting the periodic nature of the circadian variables wewish to predict, TimeSignature can be applied to samples fromdisparate studies and yield highly accurate results despite systematicdifferences between the studies. This generalizability is uniqueamongst expression-based predictors and addresses a major challengein the development of robust biomarker tests. This talk will detailthe method, present several applications, and discuss our recentwork to extend it.Bio:Rosemary Braun (PhD, MPH) is an Assistant Professor in the Division ofBiostatistics (Dept of Preventive Medicine, Feinberg School ofMedicine) and the Department of Engineering Sciences and AppliedMathematics at Northwestern University. Dr Braun’s overarchingresearch interests are in the development of mathematical andcomputational methods to elucidate how large-scale biologicalphenomena emerge from the complex interplay of thousands ofmicroscopic interactions. To this end, her research group developsmachine-learning methods for the statistical analysis ofhigh-dimensional omics data; graph-thoretical approaches for modelingthe behavior of regulatory networks; and dynamical simulations tostudy molecular interactions. Recent publications have appeared inPNAS, Nucleic Acids Research, and Bioinformatics. Dr Braun obtainedher PhD in Physics in 2004 from the University of Illinois at UrbanaChampaign, where she studied the statistical physics of living systemsunder Klaus Schulten. This was followed by an MPH (Concentration inBiostatistics) at Johns Hopkins in 2006. Prior to joiningNorthwestern, she was a Postdoctoral Fellow at the National CancerInstitute (2006-2011).

Optimization of latent variables in deep network applications

Trend filtering in exponential families

Adaptive Sampling for Ranking and Clustering

Challenges and Hopes of Unsupervised Learning by Mutual Information Maximization

Convergence Rates of Variational Posterior Distributions

On Theory for BART

Analysis of Big Dependent Data

Zero-Order Methods for the Optimization of Noisy Functions

Audio source separation models that learn without ground truth and are open to user correction

Separating an audio scene into isolated sources is a fundamental problem in computer audition, analogous to image segmentation in visual scene analysis. It is an enabling technology for many tasks, such as automatic speech recognition, labeling sound objects in an acoustic scene, music transcription, and remixing of existing recordings. Source separation systems based on deep learning are currently the most successful approaches for solving the underdetermined separation problem, where there are more sound sources (e.g. instruments in a band) than channels (a stereo recording has two channels). Currently, deep learning systems that perform source separation are trained on many mixtures (e.g., tens of thousands) for which the ground truth decompositions are already known. Since most real-world recordings have no such decomposition available, developers train systems on artificial mixtures created from isolated individual recordings. Although there are large databases of isolated speech, it is impractical to find or build large databases of isolated recordings for every arbitrary sound. This fundamentally limits the range of sounds that deep models can learn to separate. Once learned, a deep model’s output is take-it-or-leave it and it can be difficult for the end user to affect either the current output or to give corrective feedback for the future. In this talk Prof. Pardo discusses recent work in two areas. The first is bootstrapping learning of a scene segmentation model using an acoustic cue known to be used in human audition. This allows learning a model without access to ground-truth decompositions of acoustic scenes. The second is ongoing work to provide an interface for an end user to interact with a deep model, to affect the current separation and improve future separation by allowing for retraining of the model from corrective feedback. BIO: Bryan Pardo is an associate professor in the Northwestern University Department of Electrical Engineering and Computer Science. Prof. Pardo received a M. Mus. in Jazz Studies in 2001 and a Ph.D. in Computer Science in 2005, both from the University of Michigan. He has authored over 100 peer-reviewed publications. He has developed speech analysis software for the Speech and Hearing department of the Ohio State University, statistical software for SPSS and worked as a machine learning researcher for General Dynamics. While finishing his doctorate, he taught in the Music Department of Madonna University.

Machine Learning Seminar Series

Spring 2022: Wednesdays at 11:30

Towards Human-Centered Explanations of AI Predictions

Approximate Bayesian Computation via Classification

Optimal Algorithms for Continuous Non-Monotone Submodular Maximization : Theory and Applications

Fall 2021: Wednesdays at 10:30

Estimation Differential Networks from Functional Data

Transductive Robust Learning Guarantees

Learning Ising Models with Latent Variables

Policy Learning with Adaptively Collected Data

Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

Spring 2020: Fridays at 10:30

Winter 2020: Fridays at 10:30

Focused Learning in Tree-Structured Graphical Models

Predictive inference with the jackknife+

Learning Whenever Learning is Possible: Universal Learning under General Stochastic Processes

Designing Explicit Regularizers for Deep Models

The Blessings of Multiple Causes

A general framework for learning DAGs with NO TEARS

Sampling as Optimization and Optimization as Sampling

Some Recent Insights on Transfer Learning

Fall 2019: Fridays at 10:30

Language and Interaction in Minecraft

Optimizing probability distributions for machine learning: sampling meets optimization

Big data is low rank

Convex Set Disjointness, Distributed Learning of Halfspaces, and LP Feasibility

Link Disabled

Training Neural Networks: The Bigger the Better?

Greg Ongie: A function space view of overparameterized neural networks

Blake Woodworth: The complexity of finding stationary points in convex and non-convex optimization

Learning from Sparse Data

On the Computational and Statistical Efficiency of Policy Optimization in (Deep) Reinforcement Learning

Academic year 2018-2019

Wednesdays, 1-2pm in the Harper Center (Booth) Room 219 Pizza provided by UChicago CS Department

Sign up for announcement email list at https://lists.uchicago.edu/web/subscribe/ml-announce.

First-Order Methods Unleashed: Scalable Optimization in the Age of Big Data

From Fair Decisions to Social Benefit

Learn Policy Optimally via Efficiently Utilizing Data

Algorithmic fairness in online decision-making

Modern Techniques of Statistical Optimization for Machine Learning

New Thoughts on Adaptivity, Generalization and Interpolation Motivated from Neural Networks

Using Gene Expression to Tell Time

Optimization of latent variables in deep network applications

Trend filtering in exponential families

Past Quarters:

Adaptive Sampling for Ranking and Clustering

Challenges and Hopes of Unsupervised Learning by Mutual Information Maximization

Convergence Rates of Variational Posterior Distributions

On Theory for BART

Analysis of Big Dependent Data

Zero-Order Methods for the Optimization of Noisy Functions

Audio source separation models that learn without ground truth and are open to user correction

Error Feedback for Communication Efficient SGD

Additonal Machine Learning Events