Research

Published or Forthcoming

1. (2021). “Inference for Experiments with Matched Pairs” (with Y. Bai and J. P. Romano), forthcoming in the Journal of the American Statistical Association. (pdf). Supplementary Appendix (pdf)

Abstract

This paper studies inference for the average treatment effect in randomized controlled trials where treatment status is determined according to a “matched pairs” design. By a “matched pairs” design, we mean that units are sampled i.i.d. from the population of interest, paired according to observed, baseline covariates and finally, within each pair, one unit is selected at random for treatment. This type of design is used routinely throughout the sciences, but fundamental questions about its implications for inference about the average treatment effect remain. The main requirement underlying our analysis is that pairs are formed so that units within pairs are suitably “close” in terms of the baseline covariates, and we develop novel results to ensure that pairs are formed in a way that satisfies this condition. Under this assumption, we show that, for the problem of testing the null hypothesis that the average treatment effect equals a pre-specified value in such settings, the commonly used two-sample t-test and “matched pairs” t-test are conservative in the sense that these tests have limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level. We show, however, that a simple adjustment to the standard errors of these tests leads to a test that is asymptotically exact in the sense that its limiting rejection probability under the null hypothesis equals the nominal level. We also study the behavior of randomization tests that arise naturally in these types of settings. When implemented appropriately, we show that this approach also leads to a test that is asymptotically exact in the sense described previously, but additionally has finite-sample rejection probability no greater than the nominal level for certain distributions satisfying the null hypothesis. A simulation study and empirical application confirm the practical relevance of our theoretical results

2. (2019) “The Wild Bootstrap with a “Small” Number of “Large” Clusters” (with I. A. Canay and A. Santos), forthcoming in the Review of Economics and Statistics (pdf).

Abstract

This paper studies the properties of the wild bootstrap-based test proposed in Cameron et al. (2008) for testing hypotheses about the coefficients in a linear regression model with clustered data. Cameron et al. (2008) provide simulations that suggest this test works well even in settings with as few as five clusters, but existing theoretical analyses of its properties all rely on an asymptotic framework in which the number of clusters is “large.” In contrast to these analyses, we employ an asymptotic framework in which the number of clusters is “small,” but the number of observations per cluster is “large.” In this framework, we provide conditions under which an unstudentized version of the test is valid in the sense that it has limiting rejection probability under the null hypothesis that does not exceed the nominal level. Importantly, these conditions require, among other things, certain homogeneity restrictions on the distribution of covariates. In contrast, we establish that a studentized version of the test may only over-reject the null hypothesis by a “small” amount in the sense that it has limiting rejection probability under the null hypothesis that does not exceed the nominal level by more than an amount that decreases exponentially with the number of clusters. We obtain results qualitatively similar to those for the studentized version of the test for closely related “score” bootstrap-based tests, which permit testing hypotheses about parameters in nonlinear models. We illustrate the relevance of our theoretical resultsfor applied work via a simulation study and empirical application.

3. (2019) “Inference under Covariate-Adaptive Randomization with Multiple Treatments” (with F. Bugni and I. A. Canay), Quantitative Economics, Vol. 10, Iss. 4, p. 1747–1785 (pdf).

Abstract

This paper studies inference in randomized controlled trials with covariate-adaptive randomization when there are multiple treatments. More specifically, we study in this setting inference about the average effect of one or more treatments relative to other treatments or a control. As in Bugni et al. (2018), covariate-adaptive randomization refers to randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve “balance” within each stratum. Importantly, in contrast to Bugni et al. (2018), we not only allow for multiple treatments, but further allow for the proportion of units being assigned to each of the treatments to vary across strata. Wefirst study the properties of estimators derived from a “fully saturated” linear regression, i.e., a linear regression of the outcome on all interactions between indicators for each of the treatments and indicators for each of the strata. We show that tests based on these estimators using the usual heteroskedasticityconsistent estimator of the asymptotic variance are invalid in the sense that they may have limiting rejection probability under the null hypothesis strictly greater than the nominal level; on the other hand, tests based on these estimators and suitable estimators of the asymptotic variance that we provide are exact in the sense that they have limiting rejection probability under the null hypothesis equal to the nominal level. For the special case in which the target proportion of units being assigned to each of the treatments does not vary across strata, we additionally consider tests based on estimators derived from a linear regression with “strata fixed effects,” i.e., a linear regression of the outcome on indicators for each of the treatments and indicators for each of the strata. We show that tests based on these estimators using the usual heteroskedasticity-consistent estimator of the asymptotic variance are conservative in the sense that they have limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level, but tests based on these estimators and suitable estimators of the asymptotic variance that we provide are exact, thereby generalizing results in Bugni et al. (2018) for the case of a single treatment to multiple treatments. A simulation study and an empirical application illustrate the practical relevance of our theoretical results.

4. (2019) “Instrumental Variables and the Sign of the Average Treatment Effect” (with C. Machado and E. J. Vytlacil), Journal of Econometrics, Vol. 212, Iss. 2, p. 522–555 (pdf).

Abstract

This paper considers identification and inference about the sign of the average effect of a binary endogenous regressor (or treatment) on a binary outcome of interest when a binary instrument is available. In this setting, the average effect of the endogenous regressor on the outcome is sometimes referred to as the average treatment effect (ATE). We consider four different sets of assumptions: instrument exogeneity, instrument exogeneity and monotonicity on the outcome equation, instrument exogeneity andmonotonicity on the equation for the endogenous regressor, or instrument exogeneity and monotonicity on both the outcome equation and the equation for the endogenous regressor. For each of these sets of conditions, we characterize when (i) the distribution of the observed data is inconsistent with the assumptions and (ii) the distribution of the observed data is consistent with the assumptions and the sign of ATE is identified. A distinguishing feature of our results is that they are stated in terms of a reduced form parameter from the population regression of the outcome on the instrument. In particular, we find that the reduced form parameter being far enough, but not too far, from zero, implies that the distribution of the observed data is consistent with our assumptions and the sign of ATE is identified, while the reduced form parameter being too far from zero implies that the distribution of the observeddata is inconsistent with our assumptions. For each set of restrictions, we also develop methods for simultaneous inference about the consistency of the distribution of the observed data with our restrictions and the sign of the ATE when the distribution of the observed data is consistent with our restrictions. We show that our inference procedures are valid uniformly over a large class of possible distributions for the observed data that include distributions where the instrument is arbitrarily “weak.” A novel feature of the methodology is that the null hypotheses involve unions of moment inequalities.

5. (2019) “Multiple Testing in Experimental Economics” (with J. A. List and Y. Xu), Experimental Economics, Vol. 22, Iss. 4, p. 773–793 (pdf).

Abstract

The analysis of data from experiments in economics routinely involves testing multiple null hypotheses simultaneously. These different null hypotheses arise naturally in this setting for at least three different reasons: when there are multiple outcomes of interest and it is desired to determine on which of these outcomes a treatment has an effect; when the effect of a treatment may be heterogeneous in that it varies across subgroups defined by observed characteristics and it is desired to determine for which of these subgroups a treatment has an effect; and finally when there are multiple treatments of interestand it is desired to determine which treatments have an effect relative to either the control or relative to each of the other treatments. In this paper, we provide a bootstrap-based procedure for testing these null hypotheses simultaneously using experimental data in which simple random sampling is used to assign treatment status to units. Using the general results in Romano and Wolf (2010), we show under weak assumptions that our procedure (i) asymptotically controls the familywise error rate – the probability of one or more false rejections – and (ii) is asymptotically balanced in that the marginal probability of rejecting any true null hypothesis is approximately equal in large samples. Importantly, by incorporatinginformation about dependence ignored in classical multiple testing procedures, such as the Bonferroni and Holm corrections, our procedure has much greater ability to detect truly false null hypotheses. In the presence of multiple treatments, we additionally show how to exploit logical restrictions across null hypotheses to further improve power. We illustrate our methodology by revisiting the study by Karlan and List (2007) of why people give to charitable causes.

6. (2018) “The Econometrics of Shape Restrictions” (with D. Chetverikov and A. Santos), Annual Review of Economics, Vol. 10, p. 31-63 (pdf).

Abstract

We review recent developments in the econometrics of shape restrictions and their role in applied work. Our objectives are threefold. First, we aim to emphasize the diversity of applications in which shape restrictions have played a fruitful role. Second, we intend to provide practitioners with an intuitive understanding of how shape restrictions impact the distribution of estimators and test statistics. Third, we aim to provide an overview of new advances in the theory of estimation and inference under shape restrictions. Throughout the review, we outline open questions and interesting directions for future research.

7. (2018) “Inference under Covariate-Adaptive Randomization” (with F. Bugni and I. A. Canay), Journal of the American Statistical Association, Vol. 113, Iss. 524, p. 1784-1796 (pdf).

Abstract

This paper studies inference for the average treatment effect in randomized controlled trials with covariate-adaptive randomization. Here, by covariate-adaptive randomization, we mean randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve “balance” within each stratum. Our main requirement is that the randomization scheme assigns treatment status within each stratum so that the fraction of units being assigned to treatment within each stratum has a well behaved distribution centered around a proportion π as the sample size tends to infinity. Such schemes include, for example, Efron’s biased-coin design and stratified block randomization. When testing the null hypothesis that the average treatment effect equals a pre-specified value in such settings, we first show the usual two-sample t-test is conservative in the sense that it has limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level. We show, however, that a simple adjustment to the usual standard error of the two-sample t-test leads to a test that is exact in the sense that its limiting rejection probability under the null hypothesis equals the nominal level. Next, we consider the usual t-test (on the coefficient on treatment assignment) in alinear regression of outcomes on treatment assignment and indicators for each of the strata. We show that this test is exact for the important special case of randomization schemes with π = 1/2, but is otherwise conservative. We again provide a simple adjustment to the standard error that yields an exact test more generally. Finally, we study the behavior of a modified version of a permutation test, which we refer to as the covariate-adaptive permutation test, that only permutes treatment status for units within the same stratum. When applied to the usual two-sample t-statistic, we show that this test is exact for randomization schemes with π =1/2 and that additionally achieve what we refer to as “strong balance.” For randomization schemes with π ≠ 1/2, this test may have limiting rejection probability under the null hypothesis strictly greater than the nominal level. When applied to a suitably adjusted versionof the two-sample t-statistic, however, we show that this test is exact for all randomization schemes that achieve “strong balance,” including those with π ≠ 1/2. A simulation study confirms the practical relevance of our theoretical results. We conclude with recommendations for empirical practice and an empirical illustration.

8. (2017) “Keeping the ECON in Econometrics: (Micro-)Econometrics in the Journal of Political Economy” (with S. Bonhomme), forthcoming in the Journal of Political Economy (Special Issue on the 125th Anniversary of the Journal), Vol. 125, Iss. 6, p. 1846-1853. (pdf).

Abstract

In 1970, John Siegfried wrote an instructive short note in the miscellany section of the Journal of Political Economy titled “A First Lesson in Econometrics” (Siegfried 1970). The author starts by writing the equation “1 + 1 = 2 ” but immediately argues that “every budding econometrician must learn early that it is never in good taste to express the sum of two quantities in [this] form” (1378). He then produces two pages of intricate derivations to arrive at an equivalent but extremely cumbersome expression.1 From the publication of this note, it is reasonable to infer that the JPE’s editorial team at the time had some level of distrust in sophisticated econometric analysis.2 Shortly thereafter, however, the journal began to play a key role in the development of several, novel econometric ideas.Compared to many of its competitors, the type of econometric research the JPE has published has two distinctive features. The first one is the promotion of a type of econometric work that is tightly connected to economic models. In particular, the JPE has been a leading vehicle for structural econometric modeling. The second main feature is the emphasis on empirical applications of the methodology. The JPE seldom publishes abstract econometric theory. Instead, it promotes econometric analysis mainly through applications. In agreement with the motto of the Cowles Commission, the JPE’s style of econometrics is one in whichtheory and measurement go hand in hand. Since trying to review all of the econometrics research in the JPE would be a daunting task, we will focus on only a handful of contributions, each of which links economics and econometrics in particularly insightful ways. Such a choice necessarily means leaving aside a large number of equally important and influential contributions. In the same spirit, this review will be limited mainly to microeconometric applications, abstracting from key contributions to time-series econometrics, macroeconometrics, and finance that have appeared in the journal.  

9. (2017) “Practical and Theoretical Advances for Inference in Partially Identified Models” (with I. A. Canay), in B. Honore, A. Pakes, M. Piazzesi, & L. Samuelson, eds., Advances in Economics and Econometrics: 11th World Congress (Econometric Society Monographs), p. 271-306 (pdf).

Abstract

This paper surveys some of the recent literature on inference in partially identified models. After reviewing some basic concepts, including the definition of a partially identified model and the identified set, we turn our attention to the construction of confidence regions in partially identified settings. In our discussion, we emphasize the importance of requiring confidence regions to be uniformly consistent in level over relevant classes of distributions. Due to space limitations, our survey is mainly limited to the class of partially identified models in which the identified set is characterized by a finite number of moment inequalities or the closely related class of partially identified models in which the identified set is a function of a such a set. The latter class of models most commonly arise when interest focuses on a subvector of a vectorvalued parameter, whose values are limited by a finite number of moment inequalities. We then rapidly review some important parts of the broader literature on inference in partially identified models and conclude by providing some thoughts on fruitful directions for future research.

10. (2017) “Randomization Tests under an Approximate Symmetry Assumption” (with I. A. Canay and J. P. Romano), Econometrica, Vol. 85, No. 3, p. 1013-1030 (pdf). Supplementary Appendix (pdf).

Abstract

This paper develops a theory of randomization tests under an approximate symmetry assumption. Randomization tests provide a general means of constructing tests that control size in finite samples whenever the distribution of the observed data exhibits symmetry under the null hypothesis. Here, by exhibits symmetry we mean that the distribution remains invariant under a group of transformations. In this paper, we provide conditions under which the same construction can be used to construct tests that asymptotically control the probability of a false rejection whenever the distribution of the observed data exhibits approximate symmetry in the sense that the limiting distribution of a function of the data exhibits symmetry under the null hypothesis. An important application of this idea is in settings where the data may be grouped into a fixed number of “clusters” with a large number of observations within each cluster. In such settings, we show that the distribution of the observed data satisfies our approximate symmetry requirement under weak assumptions. In particular, our results allow for the clusters to be heterogeneous and also have dependence not only within each cluster, but also across clusters. This approach enjoys several advantages over other approaches in these settings.

11. (2014) “Multiple Testing and Heterogeneous Treatment Effects: Re-evaluating the Effect of PROGRESA on School Enrollment” (with S. Lee), Journal of Applied Econometrics, Vol. 29, Iss. 4, p. 612-626 (pdf).

Abstract

The effect of a program or treatment may vary according to observed characteristics. In such a setting, it may not only be of interest to determine whether the program or treatment has an effect on some sub-population defined by these observed characteristics, but also to determine for which sub-populations, if any, there is an effect. This paper treats this problem as a multiple testing problem in which each null hypothesis in the family of null hypotheses specifies whether the program has an effect on the outcome of interest for a particular sub-population. We develop our methodology in the context of PROGRESA, a large-scale poverty-reduction program in Mexico. In our application, the outcome of interest is the school enrollment rate and the sub-populations are defined by gender and highest grade completed. Under weak assumptions, the testing procedure we construct controls the familywise error rate—the probability of even one false rejection—in finite samples. Similar to earlier studies, we find that the program has a significant effect on the school enrollment rate, but only for a much smaller number of sub-populations when compared to results that do not adjust for multiple testing. Copyright © 2013 John Wiley & Sons, Ltd.

12. (2014) “A Practical Two-Step Method for Testing Moment Inequalities” (with J. P. Romano and Michael Wolf), Econometrica, Vol. 82, No. 5, p. 1979-2002 (pdf). Supplementary Appendix (pdf).

Abstract

This paper considers the problem of testing a finite number of moment inequalities. We propose a two-step approach. In the first step, a confidence region for the moments is constructed. In the second step, this set is used to provide information about which moments are “negative.” A Bonferonni-type correction is used to account for the fact that, with some probability, the moments may not lie in the confidence region. It is shown that the test controls size uniformly over a large class of distributions for the observed data. An important feature of the proposal is that it remains computationally feasible, even when the number of moments is large. The finite-sample properties of the procedure are examined via a simulation study, which demonstrates, among other things, that the proposal remains competitive with existing procedures while being computationally more attractive.

13. (2013) “On the Testability of Identification in Some Nonparametric Models with Endogeneity” (with I. A. Canay and A. Santos), Econometrica, Vol. 81, No. 6, p. 2535-2559 (pdf).

Abstract

This paper examines three distinct hypothesis testing problems that arise in the context of identification of some nonparametric models with endogeneity. The first hypothesis testing problem we study concerns testing necessary conditions for identification in some nonparametric models with endogeneity involving mean independence restrictions. These conditions are typically referred to as completeness conditions. The second and third hypothesis testing problems we examine concern testing for identification directly in some nonparametric models with endogeneity involving quantile independence restrictions. For each of these hypothesis testing problems, we provide conditions under which any test will have power no greater than size against any alternative. In this sense, we conclude that no nontrivial tests for these hypothesis testing problems exist.

14. (2012) “On the Asymptotic Optimality of Empirical Likelihood for Testing Moment Restrictions” (with Y. Kitamura and A. Santos), Econometrica, Vol. 80, No. 1, p. 413-423 (pdf). Supplemental Appendix (pdf).

Abstract

We show by example that empirical likelihood and other commonly used tests for moment restrictions are unable to control the (exponential) rate at which the probability of a Type I error tends to zero unless the possible distributions for the observed data are restricted appropriately. From this, it follows that for the optimality claim forempirical likelihood in Kitamura (2001) to hold, additional assumptions and qualifications are required. Under stronger assumptions than those in Kitamura (2001), we establish the following optimality result: (i) empirical likelihood controls the rate at which the probability of a Type I error tends to zero and (ii) among all procedures for which the probability of a Type I error tends to zero at least as fast, empirical likelihood maximizes the rate at which the probability of a Type II error tends to zero for most alternatives. This result further implies that empirical likelihood maximizes the rate at which the probability of a Type II error tends to zero for all alternatives among a class of tests that satisfy a weaker criterion for their Type I error probabilities.

15. (2012) “Treatment Effect Bounds: An Application to Swan-Ganz Catheterization” (with J. Bhattacharya and E. J. Vytlacil), Journal of Econometrics, Vol. 168, Iss. 2, p. 223-243 (pdf).

Abstract

We reanalyze data from the observational study by Connors et al. (1996) on the impact of SwanGanz catheterization on mortality outcomes. The Connors et al. (1996) study assumes that there are no unobserved differences between patients who are catheterized and patients who are not catheterized and finds that catheterization increases patient mortality. We instead allow for such differences between patients by implementing both the instrumental variable bounds of Manski (1990), which only exploitsan instrumental variable, and the bounds of Shaikh and Vytlacil (2011), which exploit mild nonparametric, structural assumptions in addition to an instrumental variable. We propose and justify the use of indicators of weekday admission as an instrument for catheterization in this context. We find that in our application, the Manski (1990) bounds do not indicate whether catheterization increases or decreases mortality, whereas the Shaikh and Vytlacil (2011) bounds reveal that catheterization increases mortality at 30 days and beyond. We show that the bounds of Shaikh and Vytlacil (2011) remain valid under even weaker assumptions than those described in Shaikh and Vytlacil (2011). We also extend the analysis to exploit a further nonparametric, structural assumption – that doctors catheterize individuals with systematically worse latent health – and find that this assumption further narrows these bounds and strengthens our conclusions. In our analysis, we construct confidence regions using the methodology developed in Romano and Shaikh (2008). We show in particular that the confidence regions are uniformly consistent in level over a large class of possible distributions for the observed data that include distributions where the instrument is arbitrarily “weak.”

16. (2012) “On the Uniform Asymptotic Validity of Subsampling and the Bootstrap” (with J. P. Romano), Annals of Statistics, Vol. 40, No. 6, p. 2798-2822 (pdf). Supplementary Appendix (pdf).

Abstract

This paper provides conditions under which subsampling and the bootstrap can be used to construct estimators of the quantiles of the distribution of a root that behave well uniformly over a large class of distributions P. These results are then applied (i) to construct confidence regions that behave well uniformly over P in the sense that the coverage probability tends to at least the nominal level uniformly over P and (ii) to construct tests that behave well uniformly over P in the sense that the size tends to no greater than the nominal level uniformly over P. Without these stronger notions of convergence, theasymptotic approximations to the coverage probability or size may be poor, even in very large samples. Specific applications include the multivariate mean, testing moment inequalities, multiple testing, the empirical process and U-statistics.

17. (2011) “Consonance and the Closure Method in Multiple Testing” (with J. P. Romano and Michael Wolf), International Journal of Biostatistics, Vol. 7, Iss. 1, Art. 12 (pdf).

Abstract

Consider the problem of testing s null hypotheses simultaneously. In order to deal with the multiplicity problem, the classical approach is to restrict attention to multiple testing procedures that control the familywise error rate (FWE). The closure method of Marcus et al. (1976) reduces the problem of constructing such procedures to one of constructing single tests that control the usual probability of a Type 1 error. It was shown by Sonnemann (1982, 2008) that any coherent multiple testing procedure can be constructed using the closure method. Moreover, it was shown by Sonnemann and Finner (1988) that any incoherent multiple testing procedure can be replaced by a coherent multiple testing procedure which is at least as good. In this paper, we first show an analogous result for dissonant and consonant multiple testing procedures. We show further that, inmany cases, the improvement of the consonant multiple testing procedure over the dissonantmultiple testing procedure may in fact be strict in the sense that it has strictly greater probability of detecting a false null hypothesis while still maintaining control of the FWE. Finally, we show how consonance can be used in the construction of some optimal maximin multiple testing procedures. This last result is especially of interest because there are very few results on optimality in the multiple testing literature.

18. (2011) “Partial Identification in Triangular Systems of Equations with Binary Dependent Variables” (with E. J. Vytlacil), Econometrica, Vol. 79, No. 3, p. 949-955 (pdf). Supplemental Appendix (pdf).

Abstract

This paper studies the special case of the triangular system of equations in Vytlacil and Yildiz (2007), where both dependent variables are binary but without imposing the restrictive support condition required by Vytlacil and Yildiz (2007) for identification of the average structural function (ASF) and the average treatment effect (ATE). Under weak regularity conditions, we derive upper and lower bounds on the ASF and the ATE.We show further that the bounds on the ASF and ATE are sharp under some further regularity conditions and an additional restriction on the support of the covariates and the instrument. 

19. (2010) “Inference for the Identified Set in Partially Identified Econometric Models” (with J. P. Romano), Econometrica, Vol. 78, No. 1, p. 169-211 (pdf).

Abstract

This paper provides computationally intensive, yet feasible methods for inference in a very general class of partially identified econometric models. Let P denote the distribution of the observed data. The class of models we consider is defined by a population objective function Q(θ P) for θ ∈ Θ. The point of departure from the classical extremum estimation framework is that it is not assumed that Q(θ P) has a unique minimizer in the parameter space Θ. The goal may be either to draw inferences about some unknown point in the set of minimizers of the population objective function or to draw inferences about the set of minimizers itself. In this paper, the object of interest is Θ0(P) = arg minθ∈Θ Q(θ P), and so we seek random sets that contain this set with at least some prespecified probability asymptotically. We also consider situations where the object of interest is the image of Θ0(P) under a known function. Random sets that satisfy the desired coverage property are constructed under weak assumptions. Conditions are provided under which the confidence regions are asymptotically valid not only pointwise in P, but also uniformly in P. We illustrate the use of our methods with an empirical study of the impact of top-coding outcomes on inferences about the parameters of a linear regression. Finally, a modest simulation study sheds some light on the finite-sample behavior of our procedure.

20. (2010) “Hypothesis Testing in Econometrics” (with J. P. Romano and Michael Wolf), Annual Review of Economics, Vol. 2, p. 75-104 (pdf).

Abstract

This article reviews important concepts and methods that are useful for hypothesis testing. First, we discuss the Neyman-Pearson framework. Various approaches to optimality are presented, including finite-sample and large-sample optimality. Then, we summarize some of the most important methods, as well as resampling methodology, which is useful to set critical values. Finally, we consider the problem of multiple testing, which has witnessed a burgeoning literature in recent years. Along the way, we incorporate some examples that are current in the econometrics literature. While many problems with well-known successful solutions are included, we also address open problems that are not easily handled with current technology, stemming from such issues as lack of optimality or poor asymptotic approximations.

21. (2010) “Multiple Testing” (with J. P. Romano and Michael Wolf), New Palgrave Dictionary of Economics (Online Edition) (pdf).

Abstract

Multiple testing refers to any instance that involves the simultaneous testing of more than one hypothesis. If decisions about the individual hypotheses are based on the unadjusted marginal p-values, then there is typically a large probability that some of the true null hypotheses will be rejected. Unfortunately, such a course of action is still common. In this article, we describe the problem of multiple testing more formally and discuss methods which account for the multiplicity issue. In particular, recent developments based on resampling result in an improved ability to reject false hypotheses compared to classical methods such as Bonferroni.

22. (2009) “A Specification Test for the Propensity Score using its Distribution Conditional on Participation” (with M. SimonsenE. J. Vytlacil, and N. Yildiz), Journal of Econometrics, Vol. 151, Iss. 1, p. 33-46 (pdf).

Abstract

Propensity score matching has become a popular method for the estimation of average treatment effects. In empirical applications, researchers almost always impose a parametric model for the propensity score. This practice raises the possibility that the model for the propensity score is misspecified and therefore the propensity score matching estimator of the average treatment effect may be inconsistent. We show that the common practice of calculating estimates of the densities of the propensity score conditional on the participation decision provides a means for examining whether the propensity score is misspecified. In particular, we derive a restriction between the density of the propensity score among participants and the density among nonparticipants. We show that this restriction between the two conditional densities is equivalent to a particular orthogonality restriction and derive a formal test based upon it. The resulting test is shown via a simulation study to have dramatically greater power than competing tests for many alternatives. The principal disadvantage of this approach is loss of power against some alternatives.

23. (2008) “Endogenous Binary Choice Models with Median Restrictions: A Comment” (with E. J. Vytlacil), Economics Letters, Vol. 98, Iss. 1, p. 23-28 (pdf).

Abstract

Hong and Tamer [Hong, H. and Tamer, E. (2003). Endogenous binary choice model with median restrictions. Economics Letters, 80 219–225] provide a sufficient condition for identification of a binary choice model with endogenous regressors. For a special case of their model, we show that this condition essentially requires that the endogenous regressor is degenerate conditional on the instrument with positive probability. Moreover, under weak assumptions, we show that this condition fails to rule out any possible value for the coefficient on the endogenous regressor.

24. (2008) “Formalized Data Snooping Based on Generalized Error Rates” (with J. P. Romano and Michael Wolf), Econometric Theory, Vol. 24, Iss. 2, p. 404-447 (pdf).

Abstract

It is common in econometric applications that several hypothesis tests are carried out simultaneously. The problem then becomes how to decide which hypotheses to reject, accounting for the multitude of tests. The classical approach is to control the familywise error rate (FWE), which is the probability of one or more false rejections+ But when the number of hypotheses under consideration is large, control of the FWE can become too demanding. As a result, the number of false hypotheses rejected may be small or even zero. This suggests replacing control of the FWE by a more liberal measure. To this end, we review a number of recent proposals from the statistical literature. We briefly discuss how these procedures apply to the general problem of model selection. A simulation study and two empirical applications illustrate the methods.

25. (2008) “Treatment Effect Bounds under Monotonicity Assumptions: An Application to Swan-Ganz Catheterization” (with J. Bhattacharya and E. J. Vytlacil), American Economic Review Papers and Proceedings, Vol 98, No. 2, p. 351-356 (pdf).

Abstract

We consider different bounds on the average effect of a treatment that follow from access to an instrument combined with alternative monotonicity restrictions. We consider three alternative sets of nonnested, structural restrictions:• The “monotone treatment response” (MTR) assumption proposed by Charles F. Manski and John Pepper (2000), hereafter MP, that imposes a priori the restriction that the outcome is increasing in the treatment;• The MTR assumption that imposes a priori the restriction that the outcome is decreasingin the treatment; and• The restrictions of Shaikh and Vytlacil (2005), hereafter SV, that impose monotonicity of the outcome in the treatment and of the treatment in the instrument, but do not impose the direction of the monotonicity in either case.

We use these different approaches to study the effects of Swan-Ganz catheterization on patient mortality. In Section I, we describe each of the resulting bounds when there are no other exogenous covariates that directly affect the outcome. We show that if the effect of the treatment is positive and the assumptions of SV hold, then the bounds of SV coincide with those of MP that assume a priori that the effect of the treatment is positive. If the effect of the treatment is instead negative and the assumptions of SV hold, then the bounds of SV coincide with those of MP that assume a priori that the effect of the treatment is negative.Hence, the trade-off between the analyses of SV and MP in the case of no exogenous covariates besides the instrument is that the latter requires one to know a priori whether the effect of the treatment is positive or negative, while the former requires one to impose monotonicity of the treatment in the instrument in order to be able to determine the sign of the treatment effect from the distribution of the observed data. If there are exogenous regressors that vary conditional on the 0tted value of the treatment, then the SV bounds become much narrower than the MP bounds. We show further that it is not possible to determine the sign of the treatment effect in the same way as SV under the assumptions of MP. Current work by Cecilia Machado, Shaikh, and Vytlacil (2008) develops the sharp bounds for the average treatment effect under the restriction that the outcome is monotone in the treatment, but without assuming the direction of the monotonicity a priori or that the treatment is monotone in the instrument. In Section II, we construct bounds on the average effect of Swan-Ganz catheterization on patient mortality under each of these three sets of assumptions. The data used are the same as in the influential observational study on the effect of Swan-Ganz catheterization on patient mortality by A. Connors et al. (1996). This study assumes that there are no unobserved differences between patients who are catheterized and patients who are not catheterized, and finds that catheterization increases patient mortality180 days after admission to the intensive care unit (ICU). The three approaches describedabove permit such differences, but require an instrument. We propose and justify the use of an indicator for weekend admission to the ICU as an instrument for catheterization in this context. Under the assumptions of SV, Bhattacharya, Shaikh, and Vytlacil (2005) find that catheterization increases patient mortality at all time horizons beyond seven days after admission to the ICU. We expand this analysis here to consider the assumptions of MP.

26. (2008) “Inference for Identifiable Parameters in Partially Identified Econometric Models” (with J. P. Romano), Journal of Statistical Planning and Inference (Special Issue in Honor of T. W. Anderson, Jr. on the Occasion of his 90th Birthday), Vol. 138, Iss. 9, p. 2786-2807 (pdf).

Abstract

This paper considers the problem of inference for partially identified econometric models. The class of models studied are definedby a population objective function Q (θ,P) for θ ε Θ. The second argument indicates the dependence of the objective function on P, the distribution of the observed data. Unlike the classical extremum estimation framework, it is not assumed that Q (θ,P) has a unique minimizer in the parameter space Θ. The goal may be either to draw inferences about some unknown point in the set of minimizers of the population objective function or to draw inferences about the set of minimizers itself. In this paper, the object of interest is some unknown point θ ε Θ0(P), where Θ0(P)=arg minθεΘ Q(θ,P), and so we seek random sets that contain each θ ε Θ0(P) with at least some prespecified probability asymptotically. We also consider situations where the object of interest is the image of some point θ ε Θ0(P) under a known function. Computationally intensive, yet feasible procedures for constructing random sets satisfying the desired coverage property under weak assumptions are provided. We also provide conditions under which the confidence regions are uniformly consistent in level.© 2008 Elsevier B.V. All rights reserved.

27. (2008) “Control of the False Discovery Rate under Dependence using the Bootstrap and Subsampling (with Discussion and Rejoinder)” (with J. P. Romano and Michael Wolf), TEST, Vol. 17, No. 3, p. 417-442 (pdf) and 461-471 (pdf).

Abstract

This paper considers the problem of testing s null hypotheses simultaneously while controlling the false discovery rate (FDR). Benjamini and Hochberg (J. R. Stat. Soc. Ser. B 57(1):289–300, 1995) provide a method for controlling the FDR based on p-values for each of the null hypotheses under the assumption that the p-values are independent. Subsequent research has since shown that this procedure is valid under weaker assumptions on the joint distribution of the p-values. Related procedures that are valid under no assumptions on the joint distribution of the p-values have also been developed. None of these procedures, however, incorporate information about the dependence structure of the test statistics. This paper develops methods for control of the FDR under weak assumptions that incorporate such information and, by doing so, are better able to detect false null hypotheses. We illustrate this property via a simulation study and two empirical applications. In particular, the bootstrap method is competitive with methods that require independenceif independence holds, but it outperforms these methods under dependence.

28. (2008) “Discussion: On Methods Controlling the False Discovery Rate” (with J. P. Romano and Michael Wolf), Sankya, Vol. 70, Part 2, p. 169-176 (pdf).

Abstract

It is a pleasure to acknowledge another insightful article by Sarkar. By developing clever expressions for the FDP, FDR, and FNR, he manages to prove fundamental properties of stepdown and stepup methods. It is particularly important that the theory is sufficiently developed so as to apply to what Sarkar calls adaptive BH methods. Here, the goal is to improve upon the Benjamini Hochberg procedure by incorporating a data-dependent estimate of the number of true null hypotheses. Theoretical justification of such methods is vital and Sarkar’s analysis is useful for this purpose. A perhaps more ambitious task is to develop methods which implicitly or explicitly estimate the joint dependence structure of the test statistics (or p-values). The focus of our discussion is to show how resampling methods can be used to construct stepdown procedures which control the FDR orother general measures of error control. The main benefit of using the bootstrap or subsampling is the ability to estimate the joint distribution of the test statistics, and thereby offer the potential of improving upon methods based on the marginal distributions of test statistics. The procedure below is a generalization of one we developed for FDR control, and the utility of the bootstrap is that it can apply to essentially arbitrary measures of errorcontrol, such as the pairwise FDR of Sarkar, the k-FWER, or the tail probabilities of the false discovery proportion. However, it is important to note that the justification of such methods is asymptotic.

29. (2006) “On Stepdown Control of the False Discovery Proportion” (with J. P. Romano), in J. Rojo, ed., IMS Lecture Notes – Monograph Series, 2nd Lehmann Symposium – Optimality, Vol. 49, p. 33-50 (pdf).

Abstract

Consider the problem of testing multiple null hypotheses. A classical approach to dealing with the multiplicity problem is to restrict attention to procedures that control the familywise error rate (FW ER), the probability of even one false rejection. However, if s is large, control of the FW ER is so stringent that the ability of a procedure which controls the FW ER to detect false null hypotheses is limited. Consequently, it is desirable to consider other measures of error control. We will consider methods based on controlof the false discovery proportion (F DP) defined by the number of false rejections divided by the total number of rejections (defined to be 0 if there are no rejections). The false discovery rate proposed by Benjamini and Hochberg (1995) controls E(F DP). Here, we construct methods such that, for any γ and α, P{F DP > γ} ≤ α. Based on p-values of individual tests, we consider stepdown procedures that control the F DP, without imposing dependence assumptions on the joint distribution of the p-values. A greatly improved version of a method given in Lehmann and Romano [10] is derived and generalized to provide a means by which any sequence of nondecreasing constants can be rescaled to ensure control of the F DP. We also provide a stepdown procedure that controls the F DR under a dependence assumption.

30. (2006) “Stepup Procedures for Control of Generalizations of the Familywise Error Rate” (with J. P. Romano), Annals of Statistics, Vol. 34, No. 4., p. 1850-1873 (pdf).

Abstract

Consider the multiple testing problem of testing null hypotheses H1, . . . , Hs. A classical approach to dealing with the multiplicity problem is to restrict attention to procedures that control the familywise error rate (FWER), the probability of even one false rejection. But if s is large, control of the FWER is so stringent that the ability of a procedure that controls the FWER to detect false null hypotheses is limited. It is therefore desirable to consider other measures of error control. This article considers two generalizations of the FWER. The first is the k-FWER, in which one is willing to tolerate k or more false rejections for some fixed k ≥ 1. The second is based on the false discovery proportion (FDP), defined to be the number of false rejections divided by the total number of rejections (and defined to be 0 if there are no rejections). Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289–300] proposed control of the false discovery rate (FDR), by which they meant that, for fixed α, E(FDP) ≤ α. Here, we consider control of the FDP in the sense that, for fixed γ and α, P{FDP > γ} ≤ α. Beginning with any nondecreasing sequence of constants and p-values for the individual tests, we derive stepup procedures that control each of these two measures of error control without imposing any assumptions on the dependence structure of the p-values. We use our results to point out a few interesting connections with some closely related stepdown procedures. We then compare and contrast two FDP-controlling procedures obtained using our results with the stepup procedure for control of the FDR of Benjamini and Yekutieli [Ann. Statist. 29 (2001) 1165–1188].

Submitted for Publication

31. (2021) “A User’s Guide to Approximate Randomization Tests with a Small Number of Clusters” (with Y. Cai, I. A. Canay and D. Kim), submitted (pdf).

Abstract

This paper provides a user’s guide to the general theory of approximate randomization tests developed in Canay et al. (2017a) when specialized to linear regressions with clustered data. Such regressions include settings in which the data is naturally grouped into clusters, such as villages or repeated observations over time on individual units, as well as settings with weak temporal dependence, in which pseudo-clusters may be formed using blocks of consecutive observations. An important feature of the methodology is that it applies to settings in which the number of clusters is small – even as small as five. We provide a step-by-step algorithmicdescription of how to implement the test and construct confidence intervals for the quantity of interest. We additionally articulate the main requirements underlying the test, emphasizing in particular common pitfalls that researchers may encounter. Finally, we illustrate the use of the methodology with two applications that further elucidate these points: one to a linear regression with clustered data based on Meng et al. (2015) and a second to a linear regression with temporally dependent data based on Munyo and Rossi (2015). The required software to replicate these empirical exercises and to aid researchers wishing to employ the methods elsewhere is provided in both R and Stata.

32. (2021) “A Two-Step Method for Testing Many Moment Inequalities” (with Y. Bai and A. Santos), resubmitted to the Journal of Business and Economic Statistics (pdf).

Abstract

This paper considers the problem of testing a finite number of moment inequalities. For this problem, Romano et al. (2014) propose a two-step testing procedure. In the first step, the procedure incorporates information about the location of moments using a confidence region. In the second step, the procedure accounts for the use of the confidence region in the first step by adjusting the significance level of the test appropriately. Its justification, however, has so far been limited to settings in which the number of moments is fixed with the sample size. In this paper, we provide weak assumptions under which the same procedure remains valid even in settings in which there are “many” moments in the sense that the number of moments grows rapidly with the sample size. We confirm the practical relevance of our theoretical guarantees in a simulation study. We additionally provide both numerical and theoretical evidence that the procedure compares favorably with the method proposed by Chernozhukov et al. (2019), which has also been shown to be valid in such settings.

33. (2020) “Inference for Large-Scale Linear Systems with Known Coefficients” (with Z. FangA. Santos and A. Torgovitsky), revision requested by Econometrica (pdf).

Abstract

This paper considers the problem of testing whether there exists a non-negative solution to a possibly under-determined system of linear equations with known coefficients. This hypothesis testing problem arises naturally in a number of settings,including random coefficient, treatment effect, and discrete choice models, as well as a class of linear programming problems. As a first contribution, we obtain a novel geometric characterization of the null hypothesis in terms of identified parameters satisfying an infinite set of inequality restrictions. Using this characterization, we devise a test that requires solving only linear programs for its implementation, and thus remains computationally feasible in the high-dimensional applications that motivate our analysis. The asymptotic size of the proposed test is shown to equal at most the nominal level uniformly over a large class of distributions that permits the number of linear equations to grow with the sample size.

34. (2020) “Inference with Imperfect Randomization: The Case of the Perry Preschool Program” (with J. Heckman, and R. Pinto), revision requested by the Journal of Econometrics (pdf).

Abstract

This paper considers the problem of making inferences about the effects of a program on multiple outcomes when the assignment of treatment status is imperfectly randomized. By imperfect randomization we mean that treatment status is reassigned after an initial randomization on the basis of characteristics that may be observed or unobserved by the analyst. We develop a partial identification approach to this problem that makes use of information limiting the extent to which randomization is imperfect to showthat it is still possible to make nontrivial inferences about the effects of the program in such settings. We consider a family of null hypotheses in which each null hypothesis specifies that the program has no effect on one of several outcomes of interest. Under weak assumptions, we construct a procedure for testing this family of null hypotheses in a way that controls the familywise error rate – the probability of even one false rejection – in finite samples. We develop our methodology in the context of a reanalysis of the HighScope Perry Preschool program. We find statistically significant effects of the program on a number of different outcomes of interest, including outcomes related to criminal activity for males andfemales, even after accounting for the imperfectness of the randomization and the multiplicity of null hypotheses.
 

35. (2020) “Inference for Ranks with Applications to Mobility across Neighborhoods and Academic Achievement across Countries” (with M. MogstadJ. P. Romano, and D. Wilhelm), revision requested by the Review of Economic Studies (pdf).

Abstract

It is often desired to rank different populations according to the value of some feature of each population.For example, it may be desired to rank neighborhoods according to some measure of intergenerational mobility or countries according to some measure of academic achievement. These rankings are invariably computed using estimates rather than the true values of these features. As a result, there may be considerable uncertainty concerning the rank of each population. In this paper, we consider the problem of accounting for such uncertainty by constructing confidence sets for the rank of each population. We consider both the problem of constructing marginal confidence sets for the rank of a particular population as well as simultaneous confidence sets for the ranks of all populations. We show how to construct suchconfidence sets under weak assumptions. An important feature of all of our constructions is that they remain computationally feasible even when the number of populations is very large. We apply our theoretical results to re-examine the rankings of both neighborhoods in the United States in terms of intergenerational mobility and developed countries in terms of academic achievement. The conclusions about which countries do best and worst at reading, math, and science are fairly robust to accounting for uncertainty. By comparison, several celebrated findings about intergenerational mobility in the United States are not robust to taking uncertainty into account.

36. (2020) “Randomization Tests in Observational Studies with Time-varying Adoption of Treatment” (with P. Toulis), resubmitted to the Journal of the American Statistical Association (pdf).

Abstract

This paper considers the problem of inference in observational studies with time-varying adoption of treatment. In addition to an unconfoundedness assumption that the potential outcomes are independent of the times at which units adopt treatment conditional on the units’ observed characteristics, our analysis assumes that the time at which each unit adopts treatment follows a Cox proportional hazards model. This assumption permits the time at which each unit adopts treatment to depend on the observed characteristics of the unit, but imposes the restriction that the probability of multiple units adopting treatment at the same time is zero. In this context, we study Fisher-style randomization tests of a null hypothesis that specifies that there is no treatment effect for all units and all time periods in a distributional sense. We first show that an infeasible test that treats the parameters of the Cox model as known has rejection probability no greater than the nominal level in finite samples. We then establish that the feasible test that replaces these parameters with consistent estimators has limiting rejection probability no greater than the nominal level. In a simulation study, we examine the practical relevance of our theoretical results, including robustness to misspecification of the model for the time at which each unit adopts treatment. Finally, we provide an empirical application of our methodology using the synthetic control-based test statistic and tobacco legislation data found in Abadie et al. (2010). 

Working Papers

37. (2020) “Partial Identification of Treatment Effect Rankings with Instrumental Variables” (with Y. Bai and E. J. Vytlacil), working paper (pdf).

Abstract

This paper studies inference in randomized controlled trials with covariate-adaptive randomization when there are multiple treatments. More specifically, we study in this setting inference about the average effect of one or more treatments relative to other treatments or a control. As in Bugni et al. (2018), covariate-adaptive randomization refers to randomization schemes that first stratify according to baseline covariates and then assign treatment status so as to achieve “balance” within each stratum. Importantly, in contrast to Bugni et al. (2018), we not only allow for multiple treatments, but furtherallow for the proportion of units being assigned to each of the treatments to vary across strata. We first study the properties of estimators derived from a “fully saturated” linear regression, i.e., a linear regression of the outcome on all interactions between indicators for each of the treatments and indicators for each of the strata. We show that tests based on these estimators using the usual heteroskedasticityconsistent estimator of the asymptotic variance are invalid in the sense that they may have limiting rejection probability under the null hypothesis strictly greater than the nominal level; on the other hand, tests based on these estimators and suitable estimators of the asymptotic variance that we provide are exact in the sense that they have limiting rejection probability under the null hypothesis equal to thenominal level. For the special case in which the target proportion of units being assigned to each of the treatments does not vary across strata, we additionally consider tests based on estimators derived from a linear regression with “strata fixed effects,” i.e., a linear regression of the outcome on indicators for each of the treatments and indicators for each of the strata. We show that tests based on these estimators using the usual heteroskedasticity-consistent estimator of the asymptotic variance are conservative in the sense that they have limiting rejection probability under the null hypothesis no greater than and typicallyless than the nominal level, but tests based on these estimators and suitable estimators of theasymptotic variance that we provide are exact, thereby generalizing results in Bugni et al. (2018) for the case of a single treatment to multiple treatments. A simulation study and an empirical application illustrate the practical relevance of our theoretical results. 

38. (2004) “Electricity Regulation in California and Input Market Distortions” (with M. Jacobsen), Stanford Institute for Economic Policy Research Discussion Paper 03-016 (pdf).

Abstract

We provide an analysis of the soft price cap regulation that occurred in California’s electricity market between December 2000 and June 2001. Wedemonstrate the incentive it created to distort the prices of electricity inputs. After introducing a theoretical model of the incentive, we present empirical data from two important input markets: pollution emissions permits and natural gas. We find substantial evidence that generators manipulated these costs in a way that allowed them to justify bids in excess of the price cap and earn higher rents than they could otherwise. Our analysis suggests that the potential benefits of soft price cap regulation were likely undone by such behavior. 

Acknowledgements

Financial support from the Stanford Institute for Economic Policy Research, the National Science Foundation and the Alfred P. Sloan Foundation is gratefully acknowledged. Parts of the above research were conducted while in residence at the Cowles Foundation for Research in Economics and the Hoover Institution at Stanford University.