Code and Data

Code for DDML and Lasso

For Stata code, see https://statalasso.github.io/ which documents packages for machine learning in Stata. The packages include features intended for prediction, model selection and causal inference.

  1. lassopack implements lasso (Tibshirani 1996), square-root lasso (Belloni et al. 2011), elastic net (Zou & Hastie 2005), ridge regression (Hoerl & Kennard 1970), adaptive lasso (Zou 2006) and post-estimation OLS. lassopack also supports logistic lasso.
  2. pdslasso offers methods to facilitate causal inference in structural models. The package allows to select control variables and/or instruments from a large set of variables in a setting where the researcher is interested in estimating the causal impact of one or more (possibly endogenous) causal variables of interest.
  3. pystacked implements stacking regression (Wolpert, 1992) via scikit-learn’s sklearn.ensemble.StackingRegressor and sklearn.ensemble.StackingClassifier. Stacking is a way of combining predictions from multiple supervised machine learners (the “base learners”) into a final prediction to improve performance.
  4. ddml implements Double/Debiased Machine Learning (DDML) for Stata. Five different estimators are supported, allowing for flexible estimation of causal effects of endogenous variables in settings with unknown functional forms and/or many exogenous variables. ddml is compatible with many existing supervised machine learning programs in Stata.

For R code, see the ddml package.

Below are links Stata code and Matlab code for running the empirical examples from “High-Dimensional Methods and Inference on Structural and Treatment Effects”. The Stata code includes a stand-alone .ado file that may be used to obtain LASSO and Post-LASSO estimates in Stata.

  1. MATLAB Code
  2. Stata Code

Code for IVQR

Below are links to MATLAB and Ox code for performing IVQR estimation and inference as developed in “Instrumental Quantile Regression Inference for Structural and Treatment Effect Models” (with Victor Chernozhukov) and “Instrumental Variable Quantile Regression” (with Victor Chernozhukov).  The MATLAB code also includes code for performing the weak identification robust inference procedure proposed in “Instrumental Variable Quantile Regression: A Robust Inference Approach” (with Victor Chernozhukov).  Along with the code, each file contains examples illustrating how the code may be implemented; the data for the examples may also be downloaded below.

  1. MATLAB Code
  2. Ox Code
  3. Data for examples

Code for Weak Instrument Robust Inference

Below are links for the Stata code and data used in the empirical example in “A Simple Approach to Heteroskedasticity and Autocorrelation Robust Inference with Weak Instruments” (with Victor Chernozhukov).  The data are taken from Acemoglu, Johnson, and Robinson (2001) “The Colonial Origins of Comparative Development: An Empirical Investigation”.  The code illustrates the basic procedure and may easily be modified for other data sets and to provide inference that is robust to autocorrelation or clustering.

I thank Mel Stephens for noticing a small error in the original code that has been corrected.  Due to this correction, the results produced by running the files given below will differ slightly from those in the published paper.

  1. Stata Code for weak instrument robust inference
  2. Data

Code for Finite Sample Inference for Quantile Regression

Below is a link to MATLAB code used to produce the results in Table 1 and Figure 1 in Chernozhukov, Hansen, and Jansson (2009) “Finite Sample Inference in Econometric Models via Quantile Restrictions.”

  1. MATLAB code for finite sample inference for quantile regression

Code for Sensitivity Analysis for IV (from “Plausibly Exogenous”)

Stata code for IV sensitivity analysis is available through Stata and can be installed in Stata by typing

ssc install plausexog

Documentation is available here. (A big thanks to Damian Clarke for putting together this nice set of code.)

Additional resources:

  1. Stata Code for IV sensitivity analysis (Stata code that produces some of the results from “Plausibly Exogenous” (with Tim Conley and Peter Rossi).  The code illustrates the basic procedure and may easily be modified for other data sets.  The file with the Stata code also includes sample data.)
  2. Review of Economics and Statistics Replication Files (Files to replicate all results from “Plausibly Exogenous” maintained by REStat.)