Instructor: Yanyan Sheng
As illustrated in our stylized case, a STEM education study often requires that the researcher innovatively develop valid and reliable measures of the quantity and quality of math or science learning opportunities provided by instruction and curriculum and of STEM learning outcomes. This need is reflected in the fact that a growing number of projects, many funded by the National Science Foundation, are investigating how psychometric models may be employed to enhance STEM education research. The new three-dimensional Next Generation Science Standards pose unique challenges to assessment and demand new ways of measuring STEM learning to ensure students are being assessed on complex science thinking. To meet this demand, it is vitally important that STEM education researchers be able to design valid instruments well-tailored to this task. This course will be ideal for early- and mid-career STEM education researchers wishing to acquire a clear understanding of the core principles/techniques used in measurement and psychometrics and to develop competence in applying psychometric methods to practical testing or various research problems in STEM education. The course presents the theory and practice of classical test theory, broadly viewed as a traditional approach, as well as “modern” measurement techniques such as the item response theory (IRT). For the traditional approach, the course will focus on the basic psychometric concept of scaling, classical test theory and its approach to test reliability, principles and procedures for investigating test validity, statistical issues of using tests for selection and classification, principles of test construction, and approaches to item analysis. Fellows will also learn how to use factor analysis to establish construct validity. For IRT, the course will examine the most popular unidimensional IRT models for dichotomous and polytomous item response data, the assumptions underlying IRT models, ways to evaluate these assumptions, methods for establishing model fit, as well as major applications of IRT including scale construction, test equating, and differential item functioning detection. Fellows will appreciate the critical differences between IRT and the classical test theory and the relative advantages of the former; they will also understand the conceptual connections between factor analysis and IRT. After engaging in the course lectures and discussions as well as completing the hands-on activities with real data examples, fellows will be able to fit an IRT model to their own data using freely available software. Moreover, they will be able to test the assumptions underlying an IRT model in a given application, understand the consequences of violating those assumptions, and evaluate the fit of an IRT model to the data.