Methods for Causal Mediation Analysis with Applications

 Qin, X., Deutsch, J., Hong, G. (2021). Revealing heterogeneity in complex mediation mechanisms: Two concurrent mediators. Journal of Policy Analysis and Management, 40(1), 158190.
This study aims to test the theory underlying Job Corps, one of the largest education and training programs in the U.S. serving disadvantaged youth. Central to the program are vocational training and general education that serve as two concurrent mediators transmitting the program impact on earnings. To distinguish the relative contribution of each, we develop methods for decomposing the Job Corps impact on earnings into an indirect effect transmitted through vocational training, an indirect effect transmitted through general education, and a direct effect attributable to supplementary services. We further ask whether general education and vocational training reinforce each other and produce a joint impact greater than the sum of the two separate pathways. Moreover, we examine the heterogeneity of each causal effect across all the Job Corps centers. This article presents concepts and methods for defining, identifying, and estimating not only the population averages but also the betweensite variance of these causal effects. Our analytic procedure incorporates a series of weighting strategies to enhance the internal and external validity of the results and assesses the sensitivity to potential violations of the identification assumptions.

 Qin, X., Hong, G., Deutsch, J., & Bein, E. (2019). Multisite causal mediation analysis in the presence of complex sample and survey designs and nonrandom nonresponse. Journal of the Royal Statistical Society, Series A, 182, Part 4, 13431370.
This study provides a template for multisite causal mediation analysis using a comprehensive weightingbased analytic procedure that enhances external and internal validity. The template incorporates a sample weight to adjust for complex sample and survey designs, adopts an inverse probability of treatment weight to adjust for differential treatment assignment probabilities, employs an estimated nonresponse weight to account for nonrandom nonresponse and utilizes a propensityscorebased weighting strategy to decompose flexibly not only the population average but also the betweensite heterogeneity of the total programme impact.Because the identification assumptions are not always warranted, a weightingbased balance checking procedure assesses the remaining overt bias, whereas a weightingbased sensitivity analysis further evaluates the potential bias related to omitted confounding or to propensity score model misspecification. We derive the asymptotic variance of the estimators for the causal effects that account for the sampling uncertainty in the estimated weights. The method is applied to a reanalysis of the data from the National Job Corps Study.

 Hong, G., Qin, X., & Yang, F. (2018). Weightingbased sensitivity analysis in causal mediation studies. Journal of Educational and Behavioral Statistics, 43(1), 3256.
Through a sensitivity analysis, the analyst attempts to determine whether a conclusion of causal inference could be easily reversed by a plausible violation of an identification assumption. Analytic conclusions that are harder to alter by such a violation are expected to add a higher value to scientific knowledge about causality. This article presents a weightingbased approach to sensitivity analysis for causal mediation studies. Extending the ratioofmediatorprobability weighting (RMPW) method for identifying natural indirect effect and natural direct effect, the new strategy assesses potential bias in the presence of omitted pretreatment or posttreatment covariates. Such omissions may undermine the causal validity of analytic conclusions. The weighting approach to sensitivity analysis reduces the reliance on functional form assumptions and removes constraints on the measurement scales for the mediator, the outcome, and the omitted covariates. In its essence, the discrepancy between a new weight that adjusts for an omitted confounder and an initial weight that omits the confounder captures the role of the confounder that contributes to the bias. The effect size of the bias due to omitted confounding of the mediatoroutcome relationship is a product of two sensitivity parameters, one associated with the degree to which the omitted confounders predict the mediator and the other associated with the degree to which they predict the outcome. The article provides an application example and concludes with a discussion of broad applications of this new approach to sensitivity analysis. Supplementary material includes R code for implementing the proposed sensitivity analysis procedure.

 Bein, E., Deutsch, J., Hong, G. Porter, K., Qin, X., & Yang, C. (2018). Twostep estimation in RMPW analysis. Statistics in Medicine, 37(8), 13041324.
This study investigates appropriate estimation of estimator variability in the context of causal mediation analysis that employs propensity scorebased weighting. Such an analysis decomposes the total effect of a treatment on the outcome into an indirect effect transmitted through a focal mediator and a direct effect bypassing the mediator. Ratioofmediatorprobability weighting (RMPW) estimates these causal effects by adjusting for the confounding impact of a large number of pretreatment covariates through propensity scorebased weighting. In step 1, a propensity score model is estimated. In step 2, the causal effects of interest are estimated using weights derived from the prior step’s regression coefficient estimates. Statistical inferences obtained from this twostep estimation procedure are potentially problematic if the estimated standard errors of the causal effect estimates do not reflect the sampling uncertainty in the estimation of the weights. This study extends to RMPW analysis a solution to the twostep estimation problem by stacking the score functions from both steps. We derive the asymptotic variancecovariance matrix for the indirect effect and direct effect twostep estimators, provide simulation results, and illustrate with an application study. Our simulation results indicate that the sampling uncertainty in the estimated weights should not be ignored. The standard error estimation using the stacking procedure offers a viable alternative to bootstrap standard error estimation. We discuss broad implications of this approach for causal analysis involving propensity scorebased weighting.
 Qin, X., & Hong, G. (2017). A weighting method for assessing betweensite heterogeneity in causal mediation mechanism. Journal of Educational and Behavioral Statistics, 42(4), 491495.
When a multisite randomized trial reveals betweensite variation in program impact, methods are needed for further investigating heterogeneous mediation mechanisms across the sites. We conceptualize and identify a joint distribution of sitespecific direct and indirect effects under the potential outcomes framework. A methodofmoments procedure incorporating ratioofmediatorprobability weighting (RMPW) consistently estimates the causal parameters. This strategy conveniently relaxes the assumption of no Treatment _ Mediator interaction while greatly simplifying the outcome model specification without invoking strong distributional assumptions. We derive asymptotic standard errors that reflect the sampling variability of the estimated weight. We also offer an easytouse R package, MultisiteMediation, that implements the proposed method. It is freely available at the Comprehensive R Archive Network (http://cran.rproject.org/web/packages/MultisiteMediation).
 Raudenbush, S. W., & Hong, G. (2017). Three mediation stories, three analytic strategies. Association for Psychological Science Observer. February 2017.
 Qin, X., & Hong, G. (2016). Analyzing heterogeneous causal mediation effects in multisite trials with application to the National Job Corps Study. In JSM Proceedings, Survey Research Methods Section. Alexandria, VA: American Statistical Association. pp.910938.
 Hong, G., Deutsch, J., & Hill, H. D. (2015). Ratioofmediatorprobability weighting for causal mediation analysis in the presence of treatmentbymediator interaction. Journal of Educational and Behavioral Statistics, 40(3), 307340. (Supplementary Materials)

Conventional methods for mediation analysis generate biased results when the mediator–outcome relationship depends on the treatment condition. This article shows how the ratioofmediatorprobability weighting (RMPW) method can be used to decompose total effects into natural direct and indirect effects in the presence of treatmentbymediator interactions. The indirect effect can be further decomposed into a pure indirect effect and a natural treatmentbymediator interaction effect. Similar to other techniques for causal mediation analysis, RMPW generates causally valid results when the sequential ignorability assumptions hold. Yet unlike the modelbased alternatives, including path analysis, structural equation modeling, and their latest extensions, RMPW requires relatively few assumptions about the distribution of the outcome, the distribution of the mediator, and the functional form of the outcome model. Correct specification of the propensity score models for the mediator remains crucial when parametric RMPW is applied. This article gives an intuitive explanation of the RMPW rationale, a mathematical proof, and simulation results for the parametric and nonparametric RMPW procedures. We apply the technique to identifying whether employment mediated the relationship between an experimental welfaretowork program and maternal depression. A detailed delineation of the analytic procedures is accompanied by online Stata code as well as a standalone RMPW software program to facilitate users’ analytic decision making.
 Qin, X., & Hong, G. (2014). Causal mediation analysis in multisite trials: An application of ratioofmediatorprobability weighting to the Head Start Impact Study. In JSM Proceedings, Social Statistics Section. Alexandria, VA: American Statistical Association, pp.912926.

 Hong, G., & Nomi, T. (2012). Weighting methods for assessing policy effects mediated by peer change. Journal of Research on Educational Effectiveness special issue on the statistical approaches to studying mediator effects in education research, 5(3), 261289.
The conventional approaches to mediation analysis such as path analysis and structural equation modeling typically involve specifying two structural models, one for the mediator and the other for the outcome. We employ an alternative approach that avoids some strong identification assumptions invoked by the conventional approaches. By applying a new weighting procedure to the observed data, we estimate the average potential outcome if the entire population were treated, the average potential outcome if the entire population were untreated, and the average potential outcome if the entire population were treated and if every individual unit’s mediator value would counterfactually remain at the same level as it would be when untreated. The estimated differences among these average potential outcomes provide estimates of the total effect, the natural direct effect, and the natural indirect effect. Applying this approach to multilevel educational data, we evaluate the total effect of the algebraforall policy in the Chicago Public Schools by comparing the math achievement of two ninthgrade cohorts. We further investigate whether the policy effect was mediated by the policyinduced change in class peer ability. Combining weighting with prognostic scorebased differenceindifferences adjustment enables us to reduce both measured and unmeasured confounding.
 Hong, G., & Nomi, T. (2012). Rejoinder. Journal of Research on Educational Effectiveness special issue on the statistical approaches to studying mediator effects in education research, 5(3), 299302.
 Hong, G., Deutsch, J., & Hill, H. (2011). Parametric and nonparametric weighting methods for estimating mediation effects: An application to the National Evaluation of WelfaretoWork Strategies. In JSM Proceedings, Social Statistics Section. Alexandria, VA: American Statistical Association, pp.32153229. (Supplementary Tables)
 Hong, G. (2010b). Ratio of mediator probability weighting for estimating natural direct and indirect effects. In JSM Proceedings, Biometrics Section. Alexandria, VA: American Statistical Association, pp.24012415.



Methods for Causal Moderation Analysis with Applications

 Hong, Y., & Hong, G. (in press). Schools with testbased promotion: Effects on instructional time alloction and student learning in third grade. AERA Open.

This study is focused on the threat of retention associated with testbased promotion in grade 3. Through analyzing the Early Childhood Longitudinal Study Kindergarten Class of 199899 data, we found that schools having such a policy apparently increased math instructional time but not reading instructional time in grade 3. On average, the policy did not produce significant differences in third graders’ reading and math learning. However, there seemed to be a notable increase in the proportion of students who achieved an aboveaverage proficiency level in grade 3 math. In both reading and math, the testbased promotion seemingly benefited students at the average or lower than average ability levels. In contrast, there was no evidence that the policy had an impact on students at the two ends of the ability distribution. We discussed the implication of the findings for the current design and implementation of testbased promotion in early grades.

 Garrett, R., & Hong, G. (2016). Impacts of grouping and time on the math learning of language minority kindergartners. Educational Evaluation and Policy Analysis, 38(2), 222244.
 Previous research has indicated benefits and potential pitfalls of withinclass homogeneous and heterogeneous ability grouping for elementary math learning. However there has been scant evidence with regard to the impacts of grouping for language minority kindergartners who may experience the small group setting differentially due to their particular needs for math and English language skill development. Analyzing the Early Childhood Longitudinal StudyKindergarten cohort data, we find that heterogeneous grouping or a combination of heterogeneous and homogeneous grouping under relatively adequate time allocation is optimal for enhancing teacher ratings of language minority kindergartners’ math performance, while using homogeneous grouping only is detrimental. As hypothesized, suboptimal instructional organization seems to place language minority kindergartners in a vulnerable situation.
 Garrett, R., & Hong, G. (2016). Impacts of grouping and time on the math learning of language minority kindergartners. Educational Evaluation and Policy Analysis, 38(2), 222244.
 Hong, G., & Raudenbush, S. W. (2008) Causal inference for timevarying instructional treatments. Journal of Educational and Behavioral Statistics, 33(3), 333362.
 Hong, G. (2012). Marginal mean weighting through stratification: A generalized method for evaluating multivalued and multiple treatments with nonexperimental data. Psychological Methods, 17(1), 4460.
 Propensity score matching and stratification enable researchers to make statistical adjustment for a large number of observed covariates in nonexperimental data. These methods have recently become popular in psychological research. Yet their applications to evaluations of multivalued and multiple treatments are limited. The inverseprobabilityoftreatment weighting (IPTW) method, though suitable for evaluating multivalued and multiple treatments, often generates results that are not robust when only a portion of the population provides support for causal inference or when the functional form of the propensity score model is misspecified. The marginal mean weighting through stratification (MMWS) method promises a viable nonparametric solution to these problems. By computing weights on the basis of stratified propensity scores, MMWS adjustment equates the pretreatment composition of multiple treatment groups under the assumption that unmeasured covariates do not confound the treatment effects given the observed covariates. Analyzing data from a weighted sample, researchers can estimate a causal effect by computing the difference between the estimated average potential outcomes associated with alternative treatments within the ANOVA framework. After providing an intuitive illustration of the theoretical rationale underlying the weighting method for causal inferences, the paper demonstrates how to apply the MMWS method to evaluations of treatments measured on a binary, ordinal, or nominal scale approximating a completely randomized experiment, to studies of multiple concurrent treatments approximating factorial randomized designs, and to moderated treatment effects approximating randomized block designs. The analytic procedure is illustrated with an evaluation of educational services for English language learners attending kindergarten in the US.
 Hong, G., Corter, C., Hong, Y., & Pelletier, J. (2012). Differential effects of literacy instruction time and homogeneous grouping in kindergarten: Who will benefit? Who will suffer? Educational Evaluation and Policy Analysis. 34(1), 6988. (Supplementary Material)
 This study challenges the belief that homogeneous ability grouping benefits highability students in cognitive and socialemotional development at the expense of their lowability peers. From a developmental point of view, we hypothesize that homogeneous grouping may improve the learning behaviors and may benefit the literacy learning of kindergartners at all ability levels through adaptive instruction under adequate instructional time. The benefits are expected to be more evident for medium and lowability children than for highability children. However, when instructional time is limited, lowability children may suffer from highintensity grouping, defined as grouping taking up a large proportion of instructional time. We also examine whether lowability kindergartners develop lower selfesteem as a result of homogeneous grouping. Analyzing the Early Childhood Longitudinal Study Kindergarten cohort (ECLSK) data, we find no overall advantage of homogeneous grouping for highability students. For mediumability students’ literacy growth, homogeneous grouping appears to be optimal when teachers spend more than one hour per day on literacy instruction; highintensity grouping shows additional advantage for improving these students’ general learning behaviors. For lowability kindergartners, homogeneous grouping with ample instruction time seems to improve their general learning behaviors whereas lowintensity grouping with ample instruction time seems to reduce internalizing problem behaviors. Yet for lowability students’ literacy growth, we find a detrimental effect of highintensity grouping when instructional time is limited. These findings contradict results from past research and have important implications for educational theories and practice.
 Hong, G. (2010a). Marginal mean weighting through stratification: Adjustment for selection bias in multilevel data. Journal of Educational and Behavioral Statistics, 35(5), 499531.
 Defining causal effects as comparisons between marginal population means, this paper introduces marginal mean weighting through stratification (MMWS) to adjust for selection bias in multilevel educational data. The paper formally shows the inherent connections among the MMWS method, propensity score stratification, and inverseprobabilityoftreatment weighting (IPTW). Both MMWS and IPTW are suitable for evaluating multiple concurrent treatments, hence have broader applications than matching, stratification, or covariance adjustment for the propensity score. Furthermore, mathematical consideration and a series of simulations reveal that the MMWS method has incorporated some important strengths of the propensity score stratification method, which generally enhance the robustness of MMWS estimates in comparison with IPTW estimates. To illustrate, I apply the MMWS method to evaluations of withinclass homogeneous grouping in early elementary reading instruction.
 Hong, G., & Hong, Y. (2009). Reading instruction time and homogeneous grouping in kindergarten: An application of marginal mean weighting through stratification. Educational Evaluation and Policy Analysis, 31(1), 5481.
 A kindergartner’s opportunities to develop reading and language arts skills are constrained by the amount of time allocated to reading instruction. In the meantime, the student’s engagement in learning tasks may increase if the instruction has been adapted to his or her prior ability through homogeneous grouping. This study investigates whether the grouping effects on kindergartners’ reading growth depend on the amount of reading instruction time and the intensity of grouping. To answer our research questions requires causal inferences about concurrent multivalued instructional treatments. We develop a procedure of applying the method of marginal mean weighting through stratification to multilevel educational data. Results from the Early Childhood Longitudinal Study Kindergarten cohort (ECLSK) data set lend support to our theoretical hypothesis that, when teachers allocate a substantial amount of time to reading instruction, homogeneous grouping helps kindergartners to gain more in reading. We find no effect of homogeneous grouping when the total amount of reading time is limited. We also find that the benefit of increasing reading instruction time becomes evident only if kindergarten teachers adapt instruction through homogeneous grouping.
 Hong, G., & Raudenbush, S. W. (2008) Causal inference for timevarying instructional treatments. Journal of Educational and Behavioral Statistics, 33(3), 333362.


Understanding the impact of sequences of instructional experiences on children’s learning is central to the study of teaching and learning in school settings. In this paper, we model the effects of timevarying instructional treatments on repeatedly observed student achievement. In doing so, we confront three challenges to causal inference: 1) the yearly reallocation of students to classrooms and teachers creates a complex structure of dependence among responses; 2) a child’s learning outcome under a certain treatment may depend on the treatment assignment of other children as well as the skill of the teacher, and may depend on the classmates and teachers that the child encountered in the past years; and 3) timevarying confounding poses special problems of endogeneity in these complex multilevel settings. We address these challenges by modifying the StableUnitTreatmentValue Assumption (SUTVA) to identify potential outcomes and causal effects and by integrating inverseprobabilityoftreatment weighting (IPTW) into a fourway valueadded hierarchical model with pseudolikelihood estimation. Using data from the Longitudinal Analysis of School Change and Performance (LESCP), we apply these methods to study the impact of “intensive math instruction” in grades 4 and 5.
Methods for Analyzing Spillover Effects with Applications

 Hong, G., Raudenbush, S. W. (2013). Heterogeneous agents, social interactions, and causal inference. In the Handbook of Causal Analysis for Social Research (pp.331352) edited by Stephen L. Morgan. NY: Springer.
 VanderWeele, T., Hong, G., Jones, S., & Brown, J. (2013). Mediation and spillover effects in grouprandomized trials: A case study of the 4R’s educational intervention. Journal of the American Statistical Association, 108(502), 469482.
 Peer influence and social interactions can give rise to spillover effects in which the exposure of one individual may affect outcomes of other individuals. Even if the intervention under study occurs at the group or cluster level as in grouprandomized trials, spillover effects can occur when the mediator of interest is measured at a lower level than the treatment. Evaluators who choose groups rather than individuals as experimental units in a randomized trial often anticipate that the desirable changes in targeted social behaviors will be reinforced through interference among individuals in a group exposed to the same treatment. In an empirical evaluation of the effect of a schoolwide intervention on reducing individual students’ depressive symptoms, schools in matched pairs were randomly assigned to the 4Rs intervention or the control condition. Class quality was hypothesized as an important mediator assessed at the classroom level. We reason that the quality of one classroom may affect outcomes of children in another classroom because children interact not simply with their classmates but also with those from other classes in the hallways or on the playground. In investigating the role of class quality as a mediator, failure to account for such spillover effects of one classroom on the outcomes of children in other classrooms can potentially result in bias and problems with interpretation. Using a counterfactual conceptualization of direct, indirect, and spillover effects, we provide a framework that can accommodate issues of mediation and spillover effects in group randomized trials. We show that the total effect can be decomposed into a natural direct effect, a withinclassroom mediated effect, and a spillover mediated effect. We give identification conditions for each of the causal effects of interest and provide results on the consequences of ignoring “interference” or “spillover effects” when they are in fact present. Our modeling approach disentangles these effects. The analysis examines whether the 4Rs intervention has an effect on childrens’ depressive symptoms through changing the quality of other classes as well as through changing the quality of a child’s own class. Supplementary materials for this article are available online.

 Hong, G., & Raudenbush, S. W. (2006). Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. Journal of the American Statistical Association, 101(475), 901910. (List of Covariates; Propensity Covariates)
 This article considers the policy of retaining lowachieving children in kindergarten rather than promoting them to first grade. Under the stableunittreatmentvalue assumption (SUTVA) as articulated by Rubin, each child at risk of retention has two potential outcomes: Y(1) if retained and Y(0) if promoted. However, SUTVA is questionable because a child’s potential outcomes will plausibly depend on which school that child attends and also on treatment assignments of other children. We develop a causal model that allows school assignment and peer treatments to affect one’s potential outcomes. We impose an identifying assumption that peer effects can be summarized via a scalar function of the vector of treatment assignments in a school. Using a large, nationally representative sample, we then estimate: (1) the effect of being retained in kindergarten rather than being promoted to the first grade in schools having a low retention rate; (2) the retention effect in schools having a high retention rate; and (3) the effect of being promoted in a lowretention school as compared to being promoted in a highretention school. This third effect is not definable under SUTVA. We use multilevel propensityscore stratification to approximate a twostage experiment. At the first stage, intact schools are blocked on covariates and then, within blocks, randomly assigned to a policy of retaining comparatively more or fewer children in kindergarten. At the second stage, “at risk” students within schools are blocked on covariates and then assigned at random to be retained. We find evidence that retainees learned less, on average, than did similar children who were promoted, a result found in both highretention and lowretention schools. We do not detect a peer treatment effect on lowrisk students.
 Hong, G., & Raudenbush, S. W. (2003). Causal Inference for MultiLevel Observational Data with Application to Kindergarten Retention Study. In JSM Proceedings, Social Statistics Section. Alexandria, VA: American Statistical Association, pp.18491856.


Causal Inference Methods for NonExperimental Data

 Hong, G., Yang, F., Qin, X. (2021). Did you conduct a sensitivity analysis? A new weightingbased approach for evaluations of the average treatment effect for the treated. Journal of the Royal Statistical Society, Series A: Statistics in Society, 184(1), 227254.
In nonexperimental research, a sensitivity analysis helps determine whether a causal conclusion could be easily reversed in the presence of hidden bias. A new approach to sensitivity analysis on the basis of weighting extends and supplements propensity score weighting methods for identifying the average treatment effect for the treated (ATT). In its essence, the discrepancy between a new weight that adjusts for the omitted confounders and an initial weight that omits them captures the role of the confounders. This strategy is appealing for a number of reasons including that, regardless of how complex the data generation functions are, the number of sensitivity parameters remains small and their forms never change. A graphical display of the sensitivity parameter values facilitates a holistic assessment of the dominant potential bias. An application to the wellknown LaLonde data lays out the implementation procedure and illustrates its broad utility. The data offer a prototypical example of nonexperimental evaluations of the average impact of job training programs for the participant population.
 Hong, G., Nomi, T., & Yu, B. (2012). Prognostic scorebased differenceindifferences. In JSM Proceedings, Social Statistics Section. Alexandria, VA: American Statistical Association, 49524966.
 Hong, G., & Yu, B. (2008). Effects of kindergarten retention on children’s socialemotional development: An application of propensity score method to multivariate multilevel data. Special Section on New Methods in Developmental Psychology, 44(2), 407421.
 This study examines the effects of kindergarten retention on children’s socialemotional development in the early, middle, and late elementary years. Previous studies have generated mixed results partly due to some major methodological challenges, including selection bias, measurement error, and divergent perceptions of multiple respondents in different domains of child development. The authors address these challenges by using propensity score stratification to contend with selection bias and by embedding measurement models in hierarchical models to account for measurement error and to model dependence among observations. The authors’ analyses of a series of multivariate models enable them to compare the retention effects across different respondents over different time points. In general, the results show no evidence suggesting that kindergarten retention does harm to children’s socialemotional development.Rather, the findings suggest that, had the retained kindergartners been promoted to the first grade instead, they would possibly have developed a lower level of selfconfidence and interest in reading and all school subjects 2 years later and would have displayed a higher level of internalizing problem behaviors at the end of the treatment year and 2 years later.
 Hong, G., & Yu, B. (2007). Early grade retention and children’s reading and math learning in elementary years. Educational Evaluation and Policy Analysis, 29(4), 239261.
 Many schools have adopted early grade retention as an intervention strategy for children displaying academic or behavioral problems. Previous analyses of the US Early Childhood Longitudinal StudyKindergarten cohort data have found evidence suggesting that children who are retained in kindergarten learn less during the repeated year than they would have were they promoted to the first grade instead. Will the kindergarten retainees recover their lost ground and excel in the long run? Meanwhile, for children who are held back in first grade rather than in kindergarten, will the retention experience have a negative effect on academic learning as observed for kindergarten retainees? This paper evaluates the effects of early grade retention on children’s cognitive development in reading and mathematics during the elementary years. Our results suggest that the negative effects of kindergarten retention on the retainees’ reading and math outcomes at the end of the treatment year substantially fade by fifth grade. We also find evidence that children repeating first grade would learn more in reading and math if promoted instead. The negative effects of first grade retention stay almost constant from one year after the treatment to three years later. In general, we find no evidence that early grade retention brings benefits to the retainees’ reading and math learning toward the end of the elementary years.
 Hong, G., & Raudenbush, S. W. (2005). Effects of kindergarten retention policy on children’s cognitive growth in reading and mathematics. Educational Evaluation and Policy Analysis, 27(3), 205224.


Grade retention has been controversial for many years, and current calls to end social promotion have lent new urgency to this issue. On the one hand, a policy of retaining in grade those students making slow progress might facilitate instruction by making classrooms more homogeneous academically. On the other hand, grade retention might harm highrisk students by limiting their learning opportunities. Analyzing data from the US Early Childhood Longitudinal Study Kindergarten cohort with the technique of multilevel propensity score stratification, we find no evidence that a policy of grade retention in kindergarten improves average achievement in mathematics or reading. Nor do we find evidence that the policy benefits children who would be promoted under the policy. However, the evidence does suggest that children who are retained learn less than they would have had they instead been promoted. The negative effect of grade retention on those retained has little influence on the overall mean achievement of children attending schools with a retention policy because the fraction of children retained in those schools is quite small. Nevertheless, the effect of retention on the retainees is considerably large

Hong, G. (2012). Editorial comments. Journal of Research on Educational Effectiveness special issue on the statistical approaches to studying mediator effects in education research, 5(3), 213214.