Machine Learning Across Disciplines: New Theoretical Developments

The Griffin Applied Economics Incubator Conference

June 23, 2022

The David Rubenstein Forum | 1201 E. 60th Street, Chicago, IL 60637

Conference Organizers

Stephane Bonhomme | The Ann L. and Lawrence B. Buttenwieser Professor in Economics and the College

Michael Franklin | Liew Family Chairman of Computer Science

Dan Nicolae | Professor and Chairman of Statistics

Azeem M. Shaikh | Ralph and Mary Otis Isham Professor in Economics and the College

 

About the Event

Speakers

Machine learning methods have revolutionized modern data analysis. This conference will bring together researchers from different disciplines — statistics, computer science, and econometrics — to discuss recent advances in our understanding of the theory underlying these methods, emphasizing theoretical guarantees on their performance. A primary goal of the conference will be to foster interactions across researchers in different areas, thereby deepening our understanding of current methods and providing conceptual tools for the development of new ones. 

Day(s)

:

Hour(s)

:

Minute(s)

:

Second(s)

Adel Javanmard, University of Southern California

Stefan Wager, Stanford University

Misha Belkin, University of California San Diego

Denis Chetverikov, UCLA

Whitney Newey, MIT

Jianqing Fan, Princeton University

Huibin (Harry) Zhou, Yale University

Robert Nowak, University of Wisconsin

Vasilis Syrgkanis, Microsoft

Tengyu MaStanford University

Agenda

8:00 – 8:40 a.m.

Breakfast

8:50 – 9:25 a.m.

Session 1: Automatic Debiased Machine Learning and Minimax Rates Under Approximate Sparsity
• Whitney K. Newey, MIT

Abstract

We give automatic debiased machine learners of any parameter of interest that is a linear, mean-square continuous functional of a regression. Examples of such parameters include a coefficient of a high dimensional linear regression, the average treatment effect, and a weighted average derivative of the average dose response for continuous treatment. The debiasing uses the functional of interest to learn a bias correction. Thus, the learners are automatic in only depending on the functional of interest and the regression. For learning the average treatment effect via neural nets the automatic debiasing outperforms state of the art debiasing based on the inverse propensity score, in a synthetic data example. For Lasso estimation of approximately sparse regressions we give automatic learners that are root-n consistent under minimal approximate sparsity of the regression as well as under the usual doubly robust rate condition.

9:25 – 10:00 a.m.

Session 2: Debiased Machine Learning for Dynamic Treatment Effects
• Vasilis Syrgkanis, Microsoft

Abstract

We consider the estimation of treatment effects in settings when multiple treatments, discrete or continuous, are assigned over time and treatments can have a causal effect on future outcomes or the state of the treated unit. We propose an extension of the double/debiased machine learning framework to estimate the dynamic effects of treatments, which can be viewed as a Neyman orthogonal (locally robust) cross-fitted version of g-estimation in the dynamic treatment regime. Our method applies to a general class of non-linear dynamic treatment models known as Structural Nested Mean Models and allows the use of machine learning methods to control for potentially high dimensional state variables, subject to a mean square error guarantee, while still allowing parametric estimation and construction of confidence intervals for the structural parameters of interest. Our work is based on a recursive peeling process, typical in g-estimation, and formulates a strongly convex objective at each stage, which allows us to extend the g-estimation framework in multiple directions: i) to provide finite sample guarantees, ii) to estimate non-linear effect heterogeneity with respect to fixed unit characteristics, within arbitrary function spaces, enabling a dynamic analogue of the RLearner algorithm for heterogeneous effects, iii) to allow for high-dimensional sparse parameterizations of the target structural functions, enabling automated model selection via a recursive lasso algorithm. We also discuss dynamic treatment effect estimation in a fully non-parametric and high-dimensional setting with discrete treatments and propose a novel automated debiased machine learning method based on recursive estimation of Riesz representers.

Based on joint works with: Greg Lewis, Victor Chernozhukov, Whitney Newey and Rahul Singh

10:00 – 10:20 a.m.

Break

10:20 – 10:55 a.m.

Session 3: Structural Deep Learning in Conditional Asset Pricing
• Jianqing Fan, Princeton University

Abstract

We develop new financial economics theory guided structural nonparametric methods for estimating conditional asset pricing models using deep neural networks, by employing time-varying conditional information on alphas and betas carried by firm-specific characteristics. Contrary to many applications of neural networks in economics, we can open the “black box” of machine learning predictions by incorporating financial economics theory into the learning, and provide an economic interpretation of the successful predictions obtained from neural networks, by decomposing the neural predictors as risk-related and mispricing components. Our estimation method starts with period-by-period cross-sectional deep learning, followed by local PCAs to capture time-varying features such as latent factors of the model. We formally establish the asymptotic theory of the structural deep-learning estimators, which apply to both in-sample fit and out-of-sample predictions. We also illustrate the “double-descent-risk” phenomena associated with over-parametrized predictions, which justifies the use of over-fitting machine learning methods. (Joint with Tracy Ke, Yuan Liao, and Andreas Neuhierl)

10:55 – 11:15 a.m.

Break

11:15 – 11:50 a.m.

Session 4: Demystifying Deep Learning
• Robert D. Nowak, University of Wisconsin-Madison

Abstract

We develop a variational framework to understand the properties of functions learned by fitting deep neural networks with rectified linear unit (ReLU) activations to data. We propose a new function space, which is related to classical bounded variation-type spaces, that captures the compositional structure associated with deep neural networks. We derive a representer theorem showing that deep ReLU networks are solutions to regularized data-fitting problems over functions from this space. The function space consists of compositions of functions from the Banach space of second-order bounded variation in the Radon domain. This Banach space has a sparsity-promoting norm, giving insight into the role of sparsity in deep neural networks. The neural network solutions have skip connections and rank-bounded weight matrices, providing new theoretical support for these common architectural choices. The variational problem we study can be recast as a finite-dimensional neural network training problem with regularization schemes related to the notions of weight decay and path-norm regularization. Finally, our analysis builds on techniques from variational spline theory, providing new connections between deep neural networks and splines.

11:50 – 12:25 p.m.

Session 5: Properties of neural networks, wide and deep
• Mikhail Belkin, UC San Diego

12:25 – 1:20 p.m.

Lunch

1:30 – 2:05 p.m.

Session 6: Leave-one-out Singular Subspace Perturbation Analysis for Spectral Clustering
• Harrison Zhou, Yale University

Abstract

The singular subspaces perturbation theory is of fundamental importance in probability and statistics. It has various applications across different fields. We consider two arbitrary matrices where one is a leave-one-column-out submatrix of the other one and establish a novel perturbation upper bound for the distance between two corresponding singular subspaces. It is well-suited for mixture models and results in a sharper and finer statistical analysis than classical perturbation bounds such as Wedin’s Theorem. Powered by this leave-one-out perturbation theory, we provide a deterministic entrywise analysis for the performance of the spectral clustering under mixture models. Our analysis leads to an explicit exponential error rate for the clustering of sub-Gaussian mixture models. For the mixture of isotropic Gaussians, the rate is optimal under a weaker signal-to-noise condition than that of Löffler et al. (2021).

2:05 – 2:40 p.m.

Session 7: Toward Understanding Self-Supervised Pre-training 
• Tengyu Ma, Stanford University

Abstract

AI is undergoing a paradigm shift the rise of models (e.g., BERT, GPT-3) that are pretrained with self-supervised learning and then adapted to a wide range of downstream tasks. Despite the unprecedented empirical success, why and how pretrained models work still largely remain a mystery. This talk will discuss recent works on analyzing contrastive learning, a family of popular self-supervised pretraining methods that learn visual representations/embeddings of images from unlabeled data. We will develop a framework that views contrastive learning as a parametric version of spectral clustering on a so-called population positive-pair graph. We will also analyze the adaptability of the representations and provide sample complexity bounds. Finally, I will briefly discuss two follow-up works that study self-supervised representations’ performance under imbalanced pretraining datasets and for shifting test distributions.

Based on https://arxiv.org/abs/2106.04156, https://arxiv.org/abs/2204.00570, https://arxiv.org/abs/2204.02683, and https://arxiv.org/abs/2110.05025, joint works with Jeff Z. Haochen, Colin Wei, Kendrick Shen, Robbie Jones, Ananya Kumar, Sang Michael Xie, Adrien Gaidon, and Percy Liang.

2:40 – 3:00 p.m.

Break

3:00 – 3:35 p.m.

Session 8: Policy Optimization in Market Equilibrium
• Stefan Wager, Stanford University

Relevant Paper

Treatment Effects in Market Equilibrium

Evan Munro, Stefan Wager, Kuang Xu

In evaluating social programs, it is important to measure treatment effects within a market economy, where interference arises due to individuals buying and selling various goods at the prevailing market price. We introduce a stochastic model of potential outcomes in market equilibrium, where the market price is an exposure mapping. We prove that average direct and indirect treatment effects converge to interpretable mean-field treatment effects, and provide estimators for these effects through a unit-level randomized experiment augmented with randomization in prices. We also provide a central limit theorem for the estimators that depends on the sensitivity of outcomes to prices. For a variant where treatments are continuous, we show that the sum of direct and indirect effects converges to the total effect of a marginal policy change. We illustrate the coverage and consistency properties of the estimators in simulations of different interventions in a two-sided market.

 

Relevant Paper

Experimenting in Equilibrium

Stefan WagerKuang Xu

Classical approaches to experimental design assume that intervening on one unit does not affect other units. There are many important settings, however, where this non-interference assumption does not hold, as when running experiments on supply-side incentives on a ride-sharing platform or subsidies in an energy marketplace. In this paper, we introduce a new approach to experimental design in large-scale stochastic systems with considerable cross-unit interference, under an assumption that the interference is structured enough that it can be captured via mean-field modeling. Our approach enables us to accurately estimate the effect of small changes to system parameters by combining unobstrusive randomization with lightweight modeling, all while remaining in equilibrium. We can then use these estimates to optimize the system by gradient descent. Concretely, we focus on the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and show that our approach enables the platform to optimize p in large systems using vanishingly small perturbations.

3:35 – 3:55 p.m.

Break

3:55 – 4:30 p.m.

Session 9: Weighted-Average Quantile Regression
• Denis Chetverikov, UCLA

Abstract

In this paper, we introduce the weighted-average quantile regression model. We argue that this model is of interest in many applied settings and develop an estimator for parameters of this model. We show that our estimator is√T-consistent and asymptotically normal with mean zero under weak conditions, where T is the sample size. We demonstrate the usefulness of our estimator in two empirical settings. First, we study the factor structures of the expected shortfalls of the industry portfolios. Second, we study inequality and social welfare dependence on individual characteristics.

4:30 – 5:05 p.m.

Session 10: Controlling the False Split Rate in Tree-Based Aggregation
• Adel Javanmard, University of Southern California

Abstract

A common challenge in data modeling is striking the right balance between models that are sufficiently flexible to adequately describe the phenomenon of interest and those that are simple enough to be easily interpretable. With the ever increasing complexity and resolution of modern data, this challenge has become even more relevant. In this work we consider this tradeoff within the common context in which data measurements can be associated with the leaves of a known tree, expressing the relationships among these measurements. Such data structures arise in myriad domains from business to science, including the classification of occupations (US OMB 2018), businesses (US OMB 2017), products, geographic areas, and taxonomies in ecology. In this talk, I will discuss the problem of tree-based aggregation which is about finding tree-defined subgroups of leaves that should really be treated as a single entity, and the entities that should be distinguished from each other. We introduce the notion of “false split rate”, an error measure that describes the degree to which subgroups have been split, when they should not have been. We then propose a multiple hypothesis testing algorithm for tree-based aggregation, which we prove controls this error measure. We apply this methodology to aggregate stocks based on their volatility and to aggregate neighborhoods of New York City based on taxi fares. This talk is based on a collaboration with Jacob Bien and Simeng Shao.

5:05 -6:00 p.m.

Cocktail Reception

Details

Register

Thank you for planning to attend the Griffin Applied Economics Incubator Conference, “Machine Learning Across Disciplines: New Theoretical Developments,” taking place at the David Rubenstein Forum at the University of Chicago June 23, 2022.

Deadline to register is June 1, 2022. Click here to register.

Date

Date: June 23, 2022

Time: 8 a.m. – 6 p.m. CDT

Agenda and further conference details coming soon!

Location

The David Rubenstein Forum1201 E. 60th Street, Chicago, IL 60637