Methods for Causal Mediation Analysis with Applications

    • Hong, G.*, Yang, F.*, & Qin, X. (2023). Posttreatment confounding in causal mediation studies: A cutting-edge problem and a novel solution via sensitivity analysis. Biometrics, 79, 1042-1056. (* equal first-authors)

In causal mediation studies that decompose an average treatment effect into indirect and direct effects, examples of post-treatment confounding are abundant. In the presence of treatment-by-mediator interactions, past research has generally considered it infeasible to adjust for a post-treatment confounder of the mediator-outcome relationship due to incomplete information: for any given individual, a post-treatment confounder is observed under the actual treatment condition while missing under the counterfactual treatment condition. This paper proposes a new sensitivity analysis strategy for handling post-treatment confounding and incorporates it into weighting-based causal mediation analysis. The key is to obtain the conditional distribution of the post-treatment confounder under the counterfactual treatment as a function of not just pretreatment covariates but also its counterpart under the actual treatment. The sensitivity analysis then generates a bound for the natural indirect effect and that for the natural direct effect over a plausible range of the conditional correlation between the post-treatment confounder under the actual and that under the counterfactual conditions. Implemented through either imputation or integration, the strategy is suitable for binary as well as continuous measures of post-treatment confounders. Simulation results demonstrate major strengths and potential limitations of this new solution. A re-analysis of the National Evaluation of Welfare-to-Work Strategies (NEWWS) Riverside data reveals that the initial analytic results are sensitive to omitted post-treatment confounding.

    • Qin, X., Deutsch, J., Hong, G. (2021). Revealing heterogeneity in complex mediation mechanisms: Two concurrent mediators. Journal of Policy Analysis and Management, 40(1), 158-190.

This study aims to test the theory underlying Job Corps, one of the largest education and training programs in the U.S. serving disadvantaged youth. Central to the program are vocational training and general education that serve as two concurrent mediators transmitting the program impact on earnings. To distinguish the relative contribution of each, we develop methods for decomposing the Job Corps impact on earnings into an indirect effect transmitted through vocational training, an indirect effect transmitted through general education, and a direct effect attributable to supplementary services. We further ask whether general education and vocational training reinforce each other and produce a joint impact greater than the sum of the two separate pathways. Moreover, we examine the heterogeneity of each causal effect across all the Job Corps centers. This article presents concepts and methods for defining, identifying, and estimating not only the population averages but also the between-site variance of these causal effects. Our analytic procedure incorporates a series of weighting strategies to enhance the internal and external validity of the results and assesses the sensitivity to potential violations of the identification assumptions.

    • Qin, X., Hong, G., Deutsch, J., & Bein, E. (2019). Multisite causal mediation analysis in the presence of complex sample and survey designs and non-random nonresponse. Journal of the Royal Statistical Society, Series A, 182, Part 4, 1343-1370.

This study provides a template for multisite causal mediation analysis using a comprehensive weighting-based analytic procedure that enhances external and internal validity. The template incorporates a sample weight to adjust for complex sample and survey designs, adopts an inverse probability of treatment weight to adjust for differential treatment assignment probabilities, employs an estimated non-response weight to account for non-random non-response and utilizes a propensity-score-based weighting strategy to decompose flexibly not only the population average but also the between-site heterogeneity of the total programme impact.Because the identification assumptions are not always warranted, a weighting-based balance checking procedure assesses the remaining overt bias, whereas a weighting-based sensitivity analysis further evaluates the potential bias related to omitted confounding or to propensity score model misspecification. We derive the asymptotic variance of the estimators for the causal effects that account for the sampling uncertainty in the estimated weights. The method is applied to a reanalysis of the data from the National Job Corps Study.

    • Hong, G., Qin, X., & Yang, F. (2018). Weighting-based sensitivity analysis in causal mediation studies. Journal of Educational and Behavioral Statistics, 43(1), 32-56.

Through a sensitivity analysis, the analyst attempts to determine whether a conclusion of causal inference could be easily reversed by a plausible violation of an identification assumption. Analytic conclusions that are harder to alter by such a violation are expected to add a higher value to scientific knowledge about causality. This article presents a weighting-based approach to sensitivity analysis for causal mediation studies. Extending the ratio-of-mediator-probability weighting (RMPW) method for identifying natural indirect effect and natural direct effect, the new strategy assesses potential bias in the presence of omitted pretreatment or posttreatment covariates. Such omissions may undermine the causal validity of analytic conclusions. The weighting approach to sensitivity analysis reduces the reliance on functional form assumptions and removes constraints on the measurement scales for the mediator, the outcome, and the omitted covariates. In its essence, the discrepancy between a new weight that adjusts for an omitted confounder and an initial weight that omits the confounder captures the role of the confounder that contributes to the bias. The effect size of the bias due to omitted confounding of the mediator-outcome relationship is a product of two sensitivity parameters, one associated with the degree to which the omitted confounders predict the mediator and the other associated with the degree to which they predict the outcome. The article provides an application example and concludes with a discussion of broad applications of this new approach to sensitivity analysis. Supplementary material includes R code for implementing the proposed sensitivity analysis procedure.

    • Bein, E., Deutsch, J., Hong, G. Porter, K., Qin, X., & Yang, C. (2018). Two-step estimation in RMPW analysis. Statistics in Medicine, 37(8), 1304-1324.

This study investigates appropriate estimation of estimator variability in the context of causal mediation analysis that employs propensity score-based weighting. Such an analysis decomposes the total effect of a treatment on the outcome into an indirect effect transmitted through a focal mediator and a direct effect bypassing the mediator. Ratio-of-mediator-probability weighting (RMPW) estimates these causal effects by adjusting for the confounding impact of a large number of pretreatment covariates through propensity score-based weighting. In step 1, a propensity score model is estimated. In step 2, the causal effects of interest are estimated using weights derived from the prior step’s regression coefficient estimates. Statistical inferences obtained from this two-step estimation procedure are potentially problematic if the estimated standard errors of the causal effect estimates do not reflect the sampling uncertainty in the estimation of the weights. This study extends to RMPW analysis a solution to the two-step estimation problem by stacking the score functions from both steps. We derive the asymptotic variance-covariance matrix for the indirect effect and direct effect two-step estimators, provide simulation results, and illustrate with an application study. Our simulation results indicate that the sampling uncertainty in the estimated weights should not be ignored. The standard error estimation using the stacking procedure offers a viable alternative to bootstrap standard error estimation. We discuss broad implications of this approach for causal analysis involving propensity score-based weighting.

  • Qin, X., & Hong, G. (2017). A weighting method for assessing between-site heterogeneity in causal mediation mechanism. Journal of Educational and Behavioral Statistics, 42(4), 491-495.

    When a multisite randomized trial reveals between-site variation in program impact, methods are needed for further investigating heterogeneous mediation mechanisms across the sites. We conceptualize and identify a joint distribution of site-specific direct and indirect effects under the potential outcomes framework. A method-of-moments procedure incorporating ratio-of-mediator-probability weighting (RMPW) consistently estimates the causal parameters. This strategy conveniently relaxes the assumption of no Treatment _ Mediator interaction while greatly simplifying the outcome model specification without invoking strong distributional assumptions. We derive asymptotic standard errors that reflect the sampling variability of the estimated weight. We also offer an easy-to-use R package, MultisiteMediation, that implements the proposed method. It is freely available at the Comprehensive R Archive Network (

  • Raudenbush, S. W., & Hong, G. (2017). Three mediation stories, three analytic strategies. Association for Psychological Science Observer. February 2017.


  • Qin, X., & Hong, G. (2016). Analyzing heterogeneous causal mediation effects in multi-site trials with application to the National Job Corps Study. In JSM Proceedings, Survey Research Methods Section. Alexandria, VA: American Statistical Association. pp.910-938.


  • Hong, G., Deutsch, J., & Hill, H. D. (2015). Ratio-of-mediator-probability weighting for causal mediation analysis in the presence of treatment-by-mediator interaction. Journal of Educational and Behavioral Statistics, 40(3), 307-340. (Supplementary Materials)
  • Conventional methods for mediation analysis generate biased results when the mediator–outcome relationship depends on the treatment condition. This article shows how the ratio-of-mediator-probability weighting (RMPW) method can be used to decompose total effects into natural direct and indirect effects in the presence of treatment-by-mediator interactions. The indirect effect can be further decomposed into a pure indirect effect and a natural treatment-by-mediator interaction effect. Similar to other techniques for causal mediation analysis, RMPW generates causally valid results when the sequential ignorability assumptions hold. Yet unlike the model-based alternatives, including path analysis, structural equation modeling, and their latest extensions, RMPW requires relatively few assumptions about the distribution of the outcome, the distribution of the mediator, and the functional form of the outcome model. Correct specification of the propensity score models for the mediator remains crucial when parametric RMPW is applied. This article gives an intuitive explanation of the RMPW rationale, a mathematical proof, and simulation results for the parametric and nonparametric RMPW procedures. We apply the technique to identifying whether employment mediated the relationship between an experimental welfare-to-work program and maternal depression. A detailed delineation of the analytic procedures is accompanied by online Stata code as well as a stand-alone RMPW software program to facilitate users’ analytic decision making.

    • Qin, X., & Hong, G. (2014). Causal mediation analysis in multi-site trials: An application of ratio-of-mediator-probability weighting to the Head Start Impact Study. In JSM Proceedings, Social Statistics Section. Alexandria, VA: American Statistical Association, pp.912-926.


    • Hong, G., & Nomi, T. (2012). Weighting methods for assessing policy effects mediated by peer change. Journal of Research on Educational Effectiveness special issue on the statistical approaches to studying mediator effects in education research, 5(3), 261-289.

    The conventional approaches to mediation analysis such as path analysis and structural equation modeling typically involve specifying two structural models, one for the mediator and the other for the outcome. We employ an alternative approach that avoids some strong identification assumptions invoked by the conventional approaches. By applying a new weighting procedure to the observed data, we estimate the average potential outcome if the entire population were treated, the average potential outcome if the entire population were untreated, and the average potential outcome if the entire population were treated and if every individual unit’s mediator value would counterfactually remain at the same level as it would be when untreated. The estimated differences among these average potential outcomes provide estimates of the total effect, the natural direct effect, and the natural indirect effect. Applying this approach to multilevel educational data, we evaluate the total effect of the algebra-for-all policy in the Chicago Public Schools by comparing the math achievement of two ninth-grade cohorts. We further investigate whether the policy effect was mediated by the policy-induced change in class peer ability. Combining weighting with prognostic score-based difference-in-differences adjustment enables us to reduce both measured and unmeasured confounding.

  • Hong, G., & Nomi, T. (2012). Rejoinder. Journal of Research on Educational Effectiveness special issue on the statistical approaches to studying mediator effects in education research, 5(3), 299-302.


  •  Hong, G., Deutsch, J., & Hill, H. (2011). Parametric and non-parametric weighting methods for estimating mediation effects: An application to the National Evaluation of Welfare-to-Work Strategies. In JSM Proceedings, Social Statistics Section. Alexandria, VA: American Statistical Association, pp.3215-3229. (Supplementary Tables)


  • Hong, G. (2010). Ratio of mediator probability weighting for estimating natural direct and indirect effects. In JSM Proceedings, Biometrics Section. Alexandria, VA: American Statistical Association, pp.2401-2415.

      • Methods for Causal Moderation Analysis with Applications

    • Hong, Y., & Hong, G. (2021). Schools with test-based promotion: Effects on instructional time alloction and student learning in third grade. AERA Open, 7(1), 1-15.

This study is focused on the threat of retention associated with test-based promotion in grade 3. Through analyzing the Early Childhood Longitudinal Study Kindergarten Class of 1998-99 data, we found that schools having such a policy apparently increased math instructional time but not reading instructional time in grade 3. On average, the policy did not produce significant differences in third graders’ reading and math learning. However, there seemed to be a notable increase in the proportion of students who achieved an above-average proficiency level in grade 3 math. In both reading and math, the test-based promotion seemingly benefited students at the average or lower than average ability levels. In contrast, there was no evidence that the policy had an impact on students at the two ends of the ability distribution. We discussed the implication of the findings for the current design and implementation of test-based promotion in early grades.

    • Garrett, R., & Hong, G. (2016). Impacts of grouping and time on the math learning of language minority kindergartners. Educational Evaluation and Policy Analysis, 38(2), 222-244.
      •  Previous research has indicated benefits and potential pitfalls of within-class homogeneous and heterogeneous ability grouping for elementary math learning. However there has been scant evidence with regard to the impacts of grouping for language minority kindergartners who may experience the small group setting differentially due to their particular needs for math and English language skill development. Analyzing the Early Childhood Longitudinal Study-Kindergarten cohort data, we find that heterogeneous grouping or a combination of heterogeneous and homogeneous grouping under relatively adequate time allocation is optimal for enhancing teacher ratings of language minority kindergartners’ math performance, while using homogeneous grouping only is detrimental. As hypothesized, suboptimal instructional organization seems to place language minority kindergartners in a vulnerable situation.
  • Hong, G. (2012). Marginal mean weighting through stratification: A generalized method for evaluating multi-valued and multiple treatments with non-experimental data. Psychological Methods, 17(1), 44-60.
    • Propensity score matching and stratification enable researchers to make statistical adjustment for a large number of observed covariates in non-experimental data. These methods have recently become popular in psychological research. Yet their applications to evaluations of multivalued and multiple treatments are limited. The inverse-probability-of-treatment weighting (IPTW) method, though suitable for evaluating multivalued and multiple treatments, often generates results that are not robust when only a portion of the population provides support for causal inference or when the functional form of the propensity score model is misspecified. The marginal mean weighting through stratification (MMW-S) method promises a viable non-parametric solution to these problems. By computing weights on the basis of stratified propensity scores, MMW-S adjustment equates the pretreatment composition of multiple treatment groups under the assumption that unmeasured covariates do not confound the treatment effects given the observed covariates. Analyzing data from a weighted sample, researchers can estimate a causal effect by computing the difference between the estimated average potential outcomes associated with alternative treatments within the ANOVA framework. After providing an intuitive illustration of the theoretical rationale underlying the weighting method for causal inferences, the paper demonstrates how to apply the MMW-S method to evaluations of treatments measured on a binary, ordinal, or nominal scale approximating a completely randomized experiment, to studies of multiple concurrent treatments approximating factorial randomized designs, and to moderated treatment effects approximating randomized block designs. The analytic procedure is illustrated with an evaluation of educational services for English language learners attending kindergarten in the US.
  • Hong, G., Corter, C., Hong, Y., & Pelletier, J. (2012). Differential effects of literacy instruction time and homogeneous grouping in kindergarten: Who will benefit? Who will suffer? Educational Evaluation and Policy Analysis. 34(1), 69-88. (Supplementary Material)
    •  This study challenges the belief that homogeneous ability grouping benefits high-ability students in cognitive and social-emotional development at the expense of their low-ability peers. From a developmental point of view, we hypothesize that homogeneous grouping may improve the learning behaviors and may benefit the literacy learning of kindergartners at all ability levels through adaptive instruction under adequate instructional time. The benefits are expected to be more evident for medium- and low-ability children than for high-ability children. However, when instructional time is limited, low-ability children may suffer from high-intensity grouping, defined as grouping taking up a large proportion of instructional time. We also examine whether low-ability kindergartners develop lower self-esteem as a result of homogeneous grouping. Analyzing the Early Childhood Longitudinal Study Kindergarten cohort (ECLS-K) data, we find no overall advantage of homogeneous grouping for high-ability students. For medium-ability students’ literacy growth, homogeneous grouping appears to be optimal when teachers spend more than one hour per day on literacy instruction; high-intensity grouping shows additional advantage for improving these students’ general learning behaviors. For low-ability kindergartners, homogeneous grouping with ample instruction time seems to improve their general learning behaviors whereas low-intensity grouping with ample instruction time seems to reduce internalizing problem behaviors. Yet for low-ability students’ literacy growth, we find a detrimental effect of high-intensity grouping when instructional time is limited. These findings contradict results from past research and have important implications for educational theories and practice.
  • Hong, G. (2010). Marginal mean weighting through stratification: Adjustment for selection bias in multilevel data. Journal of Educational and Behavioral Statistics, 35(5), 499-531.
    •  Defining causal effects as comparisons between marginal population means, this paper introduces marginal mean weighting through stratification (MMW-S) to adjust for selection bias in multi-level educational data. The paper formally shows the inherent connections among the MMW-S method, propensity score stratification, and inverse-probability-of-treatment weighting (IPTW). Both MMW-S and IPTW are suitable for evaluating multiple concurrent treatments, hence have broader applications than matching, stratification, or covariance adjustment for the propensity score. Furthermore, mathematical consideration and a series of simulations reveal that the MMW-S method has incorporated some important strengths of the propensity score stratification method, which generally enhance the robustness of MMW-S estimates in comparison with IPTW estimates.  To illustrate, I apply the MMW-S method to evaluations of within-class homogeneous grouping in early elementary reading instruction.
  • Hong, G., & Hong, Y. (2009). Reading instruction time and homogeneous grouping in kindergarten: An application of marginal mean weighting through stratification. Educational Evaluation and Policy Analysis, 31(1), 54-81.
    • A kindergartner’s opportunities to develop reading and language arts skills are constrained by the amount of time allocated to reading instruction. In the meantime, the student’s engagement in learning tasks may increase if the instruction has been adapted to his or her prior ability through homogeneous grouping. This study investigates whether the grouping effects on kindergartners’ reading growth depend on the amount of reading instruction time and the intensity of grouping. To answer our research questions requires causal inferences about concurrent multi-valued instructional treatments. We develop a procedure of applying the method of marginal mean weighting through stratification to multi-level educational data. Results from the Early Childhood Longitudinal Study Kindergarten cohort (ECLS-K) data set lend support to our theoretical hypothesis that, when teachers allocate a substantial amount of time to reading instruction, homogeneous grouping helps kindergartners to gain more in reading. We find no effect of homogeneous grouping when the total amount of reading time is limited. We also find that the benefit of increasing reading instruction time becomes evident only if kindergarten teachers adapt instruction through homogeneous grouping.
  • Hong, G., & Raudenbush, S. W. (2008) Causal inference for time-varying instructional treatments. Journal of Educational and Behavioral Statistics, 33(3), 333-362.
  • Understanding the impact of sequences of instructional experiences on children’s learning is central to the study of teaching and learning in school settings. In this paper, we model the effects of time-varying instructional treatments on repeatedly observed student achievement. In doing so, we confront three challenges to causal inference: 1) the yearly re-allocation of students to classrooms and teachers creates a complex structure of dependence among responses; 2) a child’s learning outcome under a certain treatment may depend on the treatment assignment of other children as well as the skill of the teacher, and may depend on the classmates and teachers that the child encountered in the past years; and 3) time-varying confounding poses special problems of endogeneity in these complex multi-level settings. We address these challenges by modifying the Stable-Unit-Treatment-Value Assumption (SUTVA) to identify potential outcomes and causal effects and by integrating inverse-probability-of-treatment weighting (IPTW) into a four-way value-added hierarchical model with pseudo-likelihood estimation. Using data from the Longitudinal Analysis of School Change and Performance (LESCP), we apply these methods to study the impact of “intensive math instruction” in grades 4 and 5.

    Methods for Analyzing Spillover Effects with Applications

    • Hong, G., Raudenbush, S. W. (2013). Heterogeneous agents, social interactions, and causal inference. In the Handbook of Causal Analysis for Social Research (pp.331-352) edited by Stephen L. Morgan. NY: Springer.
    • VanderWeele, T., Hong, G., Jones, S., & Brown, J. (2013). Mediation and spillover effects in group-randomized trials: A case study of the 4R’s educational intervention. Journal of the American Statistical Association, 108(502), 469-482.
      •  Peer influence and social interactions can give rise to spillover effects in which the exposure of one individual may affect outcomes of other individuals. Even if the intervention under study occurs at the group or cluster level as in group-randomized trials, spillover effects can occur when the mediator of interest is measured at a lower level than the treatment. Evaluators who choose groups rather than individuals as experimental units in a randomized trial often anticipate that the desirable changes in targeted social behaviors will be reinforced through interference among individuals in a group exposed to the same treatment. In an empirical evaluation of the effect of a school-wide intervention on reducing individual students’ depressive symptoms, schools in matched pairs were randomly assigned to the 4Rs intervention or the control condition. Class quality was hypothesized as an important mediator assessed at the classroom level. We reason that the quality of one classroom may affect outcomes of children in another classroom because children interact not simply with their classmates but also with those from other classes in the hallways or on the playground. In investigating the role of class quality as a mediator, failure to account for such spillover effects of one classroom on the outcomes of children in other classrooms can potentially result in bias and problems with interpretation. Using a counterfactual conceptualization of direct, indirect, and spillover effects, we provide a framework that can accommodate issues of mediation and spillover effects in group randomized trials. We show that the total effect can be decomposed into a natural direct effect, a within-classroom mediated effect, and a spillover mediated effect. We give identification conditions for each of the causal effects of interest and provide results on the consequences of ignoring “interference” or “spillover effects” when they are in fact present. Our modeling approach disentangles these effects. The analysis examines whether the 4Rs intervention has an effect on childrens’ depressive symptoms through changing the quality of other classes as well as through changing the quality of a child’s own class. Supplementary materials for this article are available online.
  • Hong, G., & Raudenbush, S. W. (2006). Evaluating kindergarten retention policy: A case study of causal inference for multi-level observational data. Journal of the American Statistical Association, 101(475), 901-910. (List of Covariates; Propensity Covariates)
    •  This article considers the policy of retaining low-achieving children in kindergarten rather than promoting them to first grade. Under the stable-unit-treatment-value assumption (SUTVA) as articulated by Rubin, each child at risk of retention has two potential outcomes: Y(1) if retained and Y(0) if promoted. However, SUTVA is questionable because a child’s potential outcomes will plausibly depend on which school that child attends and also on treatment assignments of other children. We develop a causal model that allows school assignment and peer treatments to affect one’s potential outcomes. We impose an identifying assumption that peer effects can be summarized via a scalar function of the vector of treatment assignments in a school. Using a large, nationally representative sample, we then estimate: (1) the effect of being retained in kindergarten rather than being promoted to the first grade in schools having a low retention rate; (2) the retention effect in schools having a high retention rate; and (3) the effect of being promoted in a low-retention school as compared to being promoted in a high-retention school. This third effect is not definable under SUTVA. We use multi-level propensity-score stratification to approximate a two-stage experiment. At the first stage, intact schools are blocked on covariates and then, within blocks, randomly assigned to a policy of retaining comparatively more or fewer children in kindergarten. At the second stage, “at risk” students within schools are blocked on covariates and then assigned at random to be retained. We find evidence that retainees learned less, on average, than did similar children who were promoted, a result found in both high-retention and low-retention schools. We do not detect a peer treatment effect on low-risk students.
  • Hong, G., & Raudenbush, S. W. (2003). Causal Inference for Multi-Level Observational Data with Application to Kindergarten Retention Study. In JSM Proceedings, Social Statistics Section. Alexandria, VA: American Statistical Association, pp.1849-1856.

    • Causal Inference Methods for Non-Experimental Data

  • Hong, G., & Chung, H.-J. (online first). Assessing the impact of the Great Recession on the transition to adulthood. Sociological Methods & Research.

The impact of a major historical event on child and youth development has been of great interest in the study of the life course. This study is focused on assessing the causal effect of the Great Recession on youth disconnection from school and work. Building on the insights offered by the age-period-cohort research, econometric methods, and developmental psychology, we innovatively develop a causal inference strategy that takes advantage of the multiple successive birth cohorts in the National Longitudinal Study of Youth 1997. The causal effect of the Great Recession is defined in terms of counterfactual developmental trajectories and can be identified under the assumption of short-term stable differences between the birth cohorts in the absence of the Great Recession. A meta-analysis aggregates the estimated effects over six between-cohort comparisons. Furthermore, we conduct a sensitivity analysis to assess the potential consequences if the identification assumption is violated. The findings contribute new evidence on how precipitous and pervasive economic hardship may disrupt youth development by gender and class of origin.

  • Hong, G., Yang, F., Qin, X. (2021). Did you conduct a sensitivity analysis? A new weighting-based approach for evaluations of the average treatment effect for the treated. Journal of the Royal Statistical Society, Series A: Statistics in Society, 184(1), 227-254.

In nonexperimental research, a sensitivity analysis helps determine whether a causal conclusion could be easily reversed in the presence of hidden bias. A new approach to sensitivity analysis on the basis of weighting extends and supplements propensity score weighting methods for identifying the average treatment effect for the treated (ATT). In its essence, the discrepancy between a new weight that adjusts for the omitted confounders and an initial weight that omits them captures the role of the confounders. This strategy is appealing for a number of reasons including that, regardless of how complex the data generation functions are, the number of sensitivity parameters remains small and their forms never change. A graphical display of the sensitivity parameter values facilitates a holistic assessment of the dominant potential bias. An application to the well-known LaLonde data lays out the implementation procedure and illustrates its broad utility. The data offer a prototypical example of non-experimental evaluations of the average impact of job training programs for the participant population.

  • Hong, G., Nomi, T., & Yu, B. (2012). Prognostic score-based difference-in-differences. In JSM Proceedings, Social Statistics Section. Alexandria, VA: American Statistical Association, 4952-4966.


  •  Hong, G., & Yu, B. (2008). Effects of kindergarten retention on children’s social-emotional development: An application of propensity score method to multivariate multi-level data. Special Section on New Methods in Developmental Psychology, 44(2), 407-421.
    •  This study examines the effects of kindergarten retention on children’s social-emotional development in the early, middle, and late elementary years. Previous studies have generated mixed results partly due to some major methodological challenges, including selection bias, measurement error, and divergent perceptions of multiple respondents in different domains of child development. The authors address these challenges by using propensity score stratification to contend with selection bias and by embedding measurement models in hierarchical models to account for measurement error and to model dependence among observations. The authors’ analyses of a series of multivariate models enable them to compare the retention effects across different respondents over different time points. In general, the results show no evidence suggesting that kindergarten retention does harm to children’s social-emotional development.Rather, the findings suggest that, had the retained kindergartners been promoted to the first grade instead, they would possibly have developed a lower level of self-confidence and interest in reading and all school subjects 2 years later and would have displayed a higher level of internalizing problem behaviors at the end of the treatment year and 2 years later.
  • Hong, G., & Yu, B. (2007). Early grade retention and children’s reading and math learning in elementary years. Educational Evaluation and Policy Analysis, 29(4), 239-261.
    •  Many schools have adopted early grade retention as an intervention strategy for children displaying academic or behavioral problems. Previous analyses of the US Early Childhood Longitudinal Study-Kindergarten cohort data have found evidence suggesting that children who are retained in kindergarten learn less during the repeated year than they would have were they promoted to the first grade instead. Will the kindergarten retainees recover their lost ground and excel in the long run? Meanwhile, for children who are held back in first grade rather than in kindergarten, will the retention experience have a negative effect on academic learning as observed for kindergarten retainees? This paper evaluates the effects of early grade retention on children’s cognitive development in reading and mathematics during the elementary years. Our results suggest that the negative effects of kindergarten retention on the retainees’ reading and math outcomes at the end of the treatment year substantially fade by fifth grade. We also find evidence that children repeating first grade would learn more in reading and math if promoted instead. The negative effects of first grade retention stay almost constant from one year after the treatment to three years later. In general, we find no evidence that early grade retention brings benefits to the retainees’ reading and math learning toward the end of the elementary years.
  • Hong, G., & Raudenbush, S. W. (2005). Effects of kindergarten retention policy on children’s cognitive growth in reading and mathematics. Educational Evaluation and Policy Analysis, 27(3), 205-224.
    • Grade retention has been controversial for many years, and current calls to end social promotion have lent new urgency to this issue. On the one hand, a policy of retaining in grade those students making slow progress might facilitate instruction by making classrooms more homogeneous academically. On the other hand, grade retention might harm high-risk students by limiting their learning opportunities. Analyzing data from the US Early Childhood Longitudinal Study Kindergarten cohort with the technique of multi-level propensity score stratification, we find no evidence that a policy of grade retention in kindergarten improves average achievement in mathematics or reading. Nor do we find evidence that the policy benefits children who would be promoted under the policy. However, the evidence does suggest that children who are retained learn less than they would have had they instead been promoted. The negative effect of grade retention on those retained has little influence on the overall mean achievement of children attending schools with a retention policy because the fraction of children retained in those schools is quite small. Nevertheless, the effect of retention on the retainees is considerably large

Hong, G. (2012). Editorial comments. Journal of Research on Educational Effectiveness special issue on the statistical approaches to studying mediator effects in education research, 5(3), 213-214.