This is an unofficial list of quantitative courses anticipated to be offered in the coming year. Finalized course schedules are published on the Registrar’s Course Search page.

### Booth School of Business

**BUSN 36906** Stochastic Processes

**BUSN 37906** Applied Bayesian Econometrics

**BUSN 41000** Business Statistics

**BUSN 41100** Applied Regression Analysis

**BUSN 41201** Big Data

**BUSN 41202** Analysis of Financial Time Series

**BUSN 41203** Financial Econometrics

**BUSN 41204** Machine Learning

**BUSN 41301** Statistical Insight into Marketing, Consulting, and Entrepreneurship

**BUSN 41600** Econometrics and Statistics Colloquium

**BUSN 41901/STAT 32400** Probability and Statistics

**BUSN 41902/STAT 32900** Statistical Inference

**BUSN 41903** Applied Econometrics

**BUSN 41910/STAT 33500** Time Series Analysis for Forecasting and Model Building

**BUSN 41912** Applied Multivariate Analysis

Comparative Human Development

**CHDV 30102/MACS 50100/ PBHS 43201/ SOCI 30315/STAT 31900** Introduction to Causal Inference

Committee on Clinical and Transnational Science (CCTS)

**CCTS 40500** Machine Learning & Advanced Analytics for Biomedicine

Economics

**ECON 31000** Empirical Analysis I

**ECON 31100** Empirical Analysis II

**ECON 31200** Empirical Analysis III

**ECON 31703** Topics in Econometrics

**ECON 31720** Applied Microeconometrics

**ECON 31740** Optimization-Conscious Econometrics

**ECON 31800** Advanced Econometrics

Social Sciences Division

**MACS 30301** Introduction to Bayesian Statistics

**MACS 33002** Introduction to Machine Learning

**MACS 40200** Structural Estimation

**MACS 40800** Unsupervised Machine Learning

**MACS 60000/SOCI 40133/CHDV 30510** Computational Content Analysis

**MAPS 31701** Data Analysis & Statistics

**MAPS 31702** Data Science

**MAPS 31760** Concepts, Assumptions, Data and Inference in Quantitative Research Methodology

**SOSC 36006** Foundations of Statistical Theory

**SOSC 36007** Overview of Quantitative Methods in Social and Behavioral Sciences

**SOSC 36008** Principles of Measurement

**SOSC 36009** Introductory Statistical Methods and Applications

Public Health Sciences

**PBHS 31001** Epidemiologic Methods

**PBHS 33300/****STAT 36900** Applied Longitudinal Data Analysis

**PBHS 33400** Multilevel Modeling

**PBHS 33500** Statistical Applications

**PBHS 40500** Advanced Epidemiologic Methods

Political Science

**PLSC 30700** Introduction to Linear Models

**PLSC 43100** Maximum Likelihood

**PLSC 43401** Mathematical Foundations of Political Methodology

**PLSC 57200** Network Analysis

Harris School of Public Policy

**PPHA 31002** Statistical Data Analysis I

**PPHA 31202** Advanced Statistics for Data Analysis I

**PPHA 34600** Program Evaluation

**PPHA 41400** Applied Regression Analysis: Analysis of Microeconomic Data

**PPHA 41420** Multilevel Regression Modeling for Public Policy

**PPHA 41600** Survey Research Methodology

**PPHA 42000** Applied Econometrics I

Psychology

**PSYC 36211** Mathematical Methods for Biological Sciences II

Sociology

**SOCI 30004** Statistical Methods of Research 1

**SOCI 30005** Statistical Methods of Research 2

**SOCI 30112** Applications of Hierarchical Linear Models

**SOCI 30253/MACS 54000** Introduction to Spatial Data Science

**SOCI 40103** Event History Analysis

**SOCI 40217/GEOG 40217/MAC 55000** Spatial Regression Analysis

**SOCI 50123** Seminar: Elegant Models for Social Structure, Probability and Non-Probability Applications

Statistics

**STAT 22000** Statistical Models and Applications

**STAT 22200** Linear Models and Experimental Design

**STAT 22400/PBHS 32400** Applied Regression Analysis

**STAT 22600/PBHS 32600** Analysis of Categorical Data

**STAT 22700/PBHS 32700** Biostatistical Methods

**STAT 23400** Statistical Models and Methods

**STAT 24400/STAT 30030** Statistical Theory and Methods I

**STAT 24500/STAT 30040** Statistical Theory and Methods II

**STAT 24410** Statistical Theory and Methods Ia

**STAT 24510** Statistical Theory and Methods IIa

**STAT 24620/STAT 32950** Multivariate Statistical Analysis: Applications and Techniques

**STAT 25100** Introduction to Mathematical Probability

**STAT 25150** Introduction to Mathematical Probability-A

**STAT 25300/31700** Introduction to Probability Models

**STAT 26100/33600** Time Dependent Data

**STAT 27400/37400** Nonparametric Inference

**STAT 27700/CMSC 25300** Mathematical Foundations of Machine Learning

**STAT 27850/30850** Multiple Testing, Modern Inference, and Replicability

**STAT 30100** Mathematical Statistics-1

**STAT 30200** Mathematical Statistics-2

**STAT 30400** Distribution Theory

**STAT 30600** Advanced Statistical Inference I

**STAT 30750** Numerical Linear Algebra

**STAT 30800** Advanced Statistical Inference II

**STAT 30810** High Dimensional Time Series Analysis

**STAT 31150/CAAM 31150** Inverse Problems and Data Assimilation

**STAT 31200** Introduction to Stochastic Processes I

**STAT 32940/FINM 33180/CAAM 32940** Multivariate Data Analysis via Matrix Decompositions

**STAT 33100** Sample Surveys

**STAT 33910/FINM 33170** Financial Statistics: Time Series, Forecasting, Mean Reversion, and High Frequency Data

**STAT 34300** Applied Linear Stat Methods

**STAT 34700** Generalized Linear Models

**STAT 34800** Modern Methods in Applied Statistics

**STAT 35920** Applied Bayesian Modeling and Inference

**STAT 37601/CMSC 25025** Machine Learning and Large-Scale Data Analysis

**STAT 37710/CAAM 37710/CMSC 35400** Machine Learning

**STAT 37790/CMSC 35425** Topics in Statistical Machine Learning

**STAT 41530** Topics in Causal Inference

Course Descriptions

**BUSN 36906-50 Stochastic Processes (Winter)
**No description available. PhD students only

**BUSN 37906-50 Applied Bayesian Econometrics (Winter)
**This course will discuss applications of Bayesian methods to micro-econometric problems. We will particularly focus on issues pertaining to panel data models with unobserved heterogeneity and the use of hierarchical models to dealing with them. While the course is more generally useful, the applications and illustrations will be focused on Marketing and Industrial Organization.

*Prereq: PhD students only*

**BUSN 41000 Business Statistics (Fall/Winter/Spring)
**Data science. Machine learning. Statistics. Predictive Analytics. No matter what it’s called, modern business runs on data. This course is an introduction to the fundamentals of probability and statistics with an aim towards building foundational skills in modern data science. Topics to be covered include 1) Exploratory data analysis and descriptive statistics, 2) Basic probability, common pitfalls and fallacies, 3) Statistical modeling, inference, p-values, and A/B testing, 4) Prediction, regression, and classification, 5) Ethics and privacy in data analysis. Emphasis will be placed on developing sound statistical reasoning and real-world applications and case studies.

**BUSN 41100 Applied Regression Analysis (Fall/Winter/Spring)
**This course is about regression, a powerful and widely used data analysis technique wherein we seek to understand how different random quantities relate to one another. Students will learn how to use regression to analyze a variety of complex real world problems, with the aim of understanding data and prediction of future events. Focus is placed on understanding of fundamental concepts, development of the skills necessary for robust application of regression techniques, and their implementation in a statistical programming language (R, MATLAB, or an alternative). Examples are used throughout to illustrate application of the tools. Topics covered include: (i) short review of simple linear regression; (ii) multiple regression (understanding the model, inference and interpretation for parameters, model building and selection, diagnostics and prediction); (iii) generalized linear models (e.g. logistic regression); (iv) time series models (autocorrelation functions, auto-regression, prediction); (v) time permitting, panel data models and causal inference. Prereq: Business 41000 or familiarity with the topics covered in Business 41000. This course is only for students with a solid background in statistics and preferably some prior exposure to linear regression.

**BUSN 41201 Big Data (Spring)
**BUS 41201 is a course about data mining: the analysis, exploration, and simplification of large high-dimensional datasets. Students will learn how to model and interpret complicated `Big Data’ and become adept at building powerful models for prediction and classification. Techniques covered include an advanced overview of linear and logistic regression, model choice and false discovery rates, multinomial and binary regression, classification, decision trees, factor models, clustering, the bootstrap and cross-validation. We learn both basic underlying concepts and practical computational skills, including techniques for analysis of distributed data. Heavy emphasis is placed on analysis of actual datasets, and on development of application specific methodology. Among other examples, we will consider consumer database mining, internet and social media tracking, network analysis, and text mining. Prereq: Bus 41000 (or 41100). Cannot enroll in BUSN 41201 if BUSN 20800 taken previously.

**BUSN 41202 Analysis of Financial Time Series (Spring)
**This course focuses on the theory and applications of financial time series analysis, especially in volatility modeling and risk management. Students are expected to gain practical experience in analyzing financial and macroeconomic data. Real examples are used throughout the course. The topics discussed include the following: (1) Analysis of asset returns: autocorrelation, business cycles, stationarity, predictability and prediction. Simple linear models and regression models with serially correlated errors. (2) Volatility models: GARCH-type models, GARCH-M models, EGARCH model, GJR model, stochastic volatility model, long-range dependence. (3) Forecasting evaluation: out-of-sample prediction and backtesting. (4) High-frequency data analysis (market microstructure): transactions data, non-synchronous trading, bid-ask bounce, duration models, logistic and ordered probit models for price changes, and realized volatility. (5) Nonlinearities in financial data: simple nonlinear models, Markov switching and threshold models, and neural network. (6) Continuous-time models: simple continuous-time and diffusion models, Ito’s lemma and Black-Scholes pricing formulas and jump diffusion models. (7) Value at Risk and expected shortfall: Riskmetrics, extreme value analysis, peaks over threshold, and quantile regression. (8) Multivariate series: cross correlation matrices, simple vector AR models, co-integration and threshold co-integration, pairs trading, factor models and multivariate volatility models. Computer program R is used throughout the course. No prior knowledge of the software is needed. All the programs used will be discussed in class and in review session. Prereq: Business 41000 (or 41100).

**BUSN 41203 Financial Econometrics (Winter)
**This course covers a variety of topics in financial econometrics. The topics covered are of real- world, practical interest and are closely linked to material covered in other advance finance courses. Topics covered include ARMA models, volatility models (GARCH), factor models, models for time varying correlations, analysis of panel data, cointegration models for long-run co-movement between prices and models for transactions data and the analysis of transactions cost. Prereq: Business 41000 (or 41100), or instructor consent. Cannot enroll in BUSN 41203 if BUSN 20820 taken previously.

**BUSN 41204 Machine Learning (Winter)
**Students will learn about state-of-the-art machine learning techniques and how to apply them in business related problems. Techniques will be introduced in the context of business applications and the emphasis will be put on how machine learning can be used to create value and provide insights from data. First, and the biggest, part of the class will focus on predictive analytics. Students will learn about decision trees, nearest neighbor classifiers, boosting, random forests, deep neural networks, naive Bayes and support vector machines. Among other examples, we will apply these techniques to detecting spam in email, click-through rate prediction in online advertisement, image classification, face recognition, sentiment analysis and churn prediction. Students will learn what techniques to apply and why. In the second part of the class, students will learn about unsupervised techniques for extracting actionable patterns from data. Examples include clustering, collaborative filtering, probabilistic graphical modelling and dimension reduction with applications to customer segmentation, recommender systems, graph and time series mining, and anomaly detection. Prereq: Bus 41100. Cannot enroll in BUSN 41204 if BUSN 20810 taken previously.

**BUSN 41301 Statistical Insight into Marketing, Consulting, and Entrepreneurship (Fall)
**You decide to establish a start-up in marketing consulting. You search the Internet and find to your dismay well over 650 companies in that area, each one claiming to be best and unique. In order to compete in this arena you need to have the ability to identify upcoming trends and new problems in the marketing area, AND to be able to provide original, sound, fast and applicable solutions to these problems. One such example that is not dealt by many of the marketing consulting companies is the following shelf-planning problem.

*Imagine a customer in a deli store on a Sunday morning intending to buy bagels. There are only two bagels on the shelf. What would you predict the person would do? Hurry up and buy the only remaining bagels before they are gone? Would he consider the two bagels as being the least fresh, touched and left by all former customers, and therefore decide to wait for a fresher batch? As a consultant to the store manager, how would you determine the optimal number of bagels that should be on the shelf at a given time in order to avoid making customers reluctant to buy?*As it turns out, the methodology covered by this course, that solves the above-mentioned problem, can also be used for the analysis of customer attrition, sale promotion and more. Unlike marketing research, marketing consulting is a problem-solving endeavor that requires a great deal of specificity and is fueled by experience. This course is meant to give future consultants and entrepreneurs important tools and ways of thinking that are relevant for dealing with insightful consulting and are useful in the practice of marketing consulting and beyond. The course addresses a variety of practical consulting problems and their solutions. Some examples are: (1) Optimal shelf-planning (see the bagels example above); (2) Analyzing customer attrition as a process (rather than as an event-driven phenomenon); (3) Prediction of a customer’s purchase behavior (buying intentions, buying propensity, etc.) from the customer’s patterns of usage of media, life style, political orientation, etc.; (4) Analysis of satisfaction -how to create a VALID satisfaction scale, how to rank products by satisfaction of customers, how to detect easy-to-please customers, etc.; (5) Analysis of brand loyalty -how to measure loyalty, how to determine whether loyalty to certain brands exists, and how to quantify it; (6) Optimizing predictive modeling when financial rewards and penalties exist in regard to correct and incorrect prediction, respectively. The course is taught in a way that emphasizes the interpretation of results rather than computations. Although this course uses statistical reasoning, it is NOT too mathematical in nature. To aid in the analysis, an interactive and user friendly R-based software containing innovative routines will be used in this course. There is no need of programming, or programming skills in this course – except the ability to use your finger to click a key. Prereq: Bus 41000 (OR 41100)

**. Students that did not take one of these courses but believe they have a strong background in statistics can still bid for the course given the explicit written permission of the instructor. Instructor consent required for non-Booth students.**

__is mandatory: strict__**BUSN 41600 Econometrics and Statistics Colloquium (Fall/Winter/Spring)
**Workshops in each academic area provide a forum for faculty, PhD students, and invited guests to present, discuss, and debate new research. Prereq: PhD students only. Instructor permission required for MBA students. BUSN 41600=ECON 51400.

** ****BUSN 41901/STAT 32400 Probability and Statistics (Fall)
**This Ph.D.-level course (in addition to 41902) provides a thorough introduction to Classical and Bayesian statistical theory. The two-quarter sequence provides the necessary probability and statistical background for many of the advanced courses in the Chicago Booth curriculum. The central topic of Business 41901 is probability. Basic concepts in probability are covered. An introduction to martingales is given. Homework assignments are given throughout the quarter. Prereq: One year of calculus; BUSN 41901=STAT 32400

**BUSN 41902/STAT 32900 Statistical Inference (Winter)
**This Ph.D.-level course is the second in a two-quarter sequence with Business 41901. The central topic is statistical inference using asymptotic approximations. We will cover linear regression models, generalized method of moments, time series. Time permitting; we will discuss factor models. Prereq: Business 41901

**BUSN 41903 Applied Econometrics (Spring)
**This Ph.D.-level course covers a variety of techniques that are used in econometric analysis. The class builds heavily on material developed in 41902, and it is strongly recommended that students have taken 41902 or equivalent before enrolling in this course. Some topics that may be covered are (i) heteroscedasticity and correlation robust inference methods including HAC, clustering, bootstrap methods, and randomization inference; (ii) causal inference methods including instrumental variables estimation, difference-in-differences estimation, and estimators of treatment effects under treatment effect heterogeneity; (iii) an introduction to nonparametric and high-dimensional statistical methods. Prereq: Business 41901 and 41902.

**BUSN 41910 Time-series Analysis for Forecasting and Model Building (Winter)
**Forecasting plays an important role in business planning and decision-making. This Ph.D.-level course discusses time series models that have been widely used in business and economic data analysis and forecasting. Both theory and methods of the models are discussed. Real examples are used throughout the course to illustrate applications. The topics covered include: (1) stationary and unit-root non-stationary processes; (2) linear dynamic models, including Autoregressive Moving Average models; (3) model building and data analysis; (4) prediction and forecasting evaluation; (5) asymptotic theory for estimation including unit-root theory; (6) models for time varying volatility; (7) models for time varying correlation including Dynamic Conditional Correlation and time varying factor models.; (9) state-space models and Kalman filter; and (10) models for high frequency data. Prereq: Business 41901 or instructor consent. BUSN 41910=STAT 33500

**BUSN 41912 Applied Multivariate Analysis (Spring)
**This course covers the basic theory and methods for analysis of multi-dimensional data. The ability to analyze multiple responses simultaneously opens up many applications. The topics covered in this course include descriptive statistics for multivariate data, basic properties of multivariate distributions such as normal and Student-t, multivariate linear regression, principal components analysis for dimension reduction, factor analysis and dynamic factor models, canonical correlation analysis, discrimination and classification, independent component models, dimension reduction, and simple multiple time series models. Data mining and machine learning will be discussed if time permits. Applications in business and economics are emphasized. Software R will be used. Prereq: Business 41901 or 41902 or equivalent courses. BUSN 41312=STAT 32900

**CCTS 40500 Machine Learning & Advanced Analytics for Biomedicine**

The age of ubiquitous data is rapidly transforming scientific research, and advanced analytics powered by sophisticated learning algorithms is uncovering new insights in complex open problems in biology and biomedicine. The goal of this course is to provide an introductory overview of the key concepts in machine learning, outlining the potential applications in biomedicine. Beginning from basic statistical concepts, we will discuss concepts and implementations of standard and state of the art classification and prediction algorithms, and go on to discuss more advanced topics in unsupervised learning, deep learning architectures, and stochastic time series analysis. We will also cover emerging ideas in data-driven causal inference, and demonstrate applications in uncovering etiological insights from large scale clinical databases of electronic health records, and publicly available sequence and omics datasets. The acquisition of hands-on skills will be emphasized over machine learning theory. On successfully completing the course, students will have acquired enough knowledge of the underlying machinery to intuit and implement solutions to non-trivial data science problems arising in biology and medicine.

Prerequisite(s): Rudimentary knowledge of probability theory, and basic exposure to scripting languages such as python/R is required. This course does not qualify in the Biological Sciences major.

Equivalent Course(s): CCTS 20500, BIOS 29208

**CHDV 30102/MACS 50100/PBHS 43201/SOCI 30315/STAT 31900 Introduction to Causal Inference. (Winter)**

This course is designed for graduate students and advanced undergraduate students from the social sciences, education, public health science, public policy, social service administration, and statistics who are involved in quantitative research and are interested in studying causality. The goal of this course is to equip students with basic knowledge of and analytic skills in causal inference. Topics for the course will include the potential outcomes framework for causal inference; experimental and observational studies; identification assumptions for causal parameters; potential pitfalls of using ANCOVA to estimate a causal effect; propensity score based methods including matching, stratification, inverse-probability-of-treatment-weighting (IPTW), marginal mean weighting through stratification (MMWS), and doubly robust estimation; the instrumental variable (IV) method; regression discontinuity design (RDD) including sharp RDD and fuzzy RDD; difference in difference (DID) and generalized DID methods for cross-section and panel data, and fixed effects model.Intermediate Statistics or equivalent such as STAT 224/PBHS 324, PP 31301, BUS 41100, or SOC 30005 is a prerequisite.

This course is a prerequisite for “Advanced Topics in Causal Inference” and “Mediation, moderation, and spillover effects.” ** (=MACS 5100, =PBHS 43201, =PLSC 30102, =SOCI 30315, =STAT 31900).
**

*PQ: Intermediate Statistics or equivalent such as STAT 224/PBHS 324, PP1301, BUS 41100, or SOC 30005*.

**ECON 31000 Empirical Analysis I (Fall)
**This course introduces students to the key tools of econometric analysis. It covers basic OLS regression model, generalized least squares, asymptotic theory and hypothesis testing for maximum likelihood estimation, extremum estimators, instrumental variables, decision theory and Bayesian inference.

**ECON 31100 Empirical Analysis II (Winter)
**This course develops methods of analyzing Markov specifications of dynamic economic models. Models with stochastic growth are accommodated and their properties analyzed. Methods for identifying macroeconomic shocks and their transmission mechanisms are developed. Related filtering methods for models with hidden states are studied. The properties estimation and inference methods based on maximum likelihood and generalized method of moments are derived. These econometric methods are applied to models from macroeconomics and financial economics.

**ECON 31200 Empirical Analysis III (Spring)
**The course will review some of the classical methods you were introduced to in previous quarters and give examples of their use in applied microeconomic research. Our focus will be on exploring and understanding data sets, evaluating predictions of economic models, and identifying and estimating the parameters of economic models. The methods we will build on include regression techniques, maximum likelihood, method of moments estimators, as well as some non-parametric methods. Lectures and homework assignments will seek to build proficiency in the correct application of these methods to economic research questions.

**ECON 31720* Applied Microeconometrics (Fall)
**This course is about empirical strategies that are commonly used in applied microeconomics. The topics will include: control variables (matching), instrumental variables, regression discontinuity and kink designs, panel data, difference-in-differences, and quantile regression. The emphasis of the course is on identification and practical implementation. The course also covers the shortcomings of commonly used tools, and discusses recent theoretical research aimed at addressing these deficiencies.

**ECON 31740* Optimization-Conscious Econometrics (Winter)
**Modern research in econometrics often intersects with machine learning and big data questions. Likewise, while the overlap of econometrics with optimization and operations research has traditionally been limited, previously intractable large scale or combinatorially difficult econometrics problems are now being solved using modern optimization software and heuristics. This lays out a rich research agenda and opens up consequential new questions for econometricians. How can machine learning methods be used for econometric regression analysis and causal inference? How can modern optimization methods be applied to solve previously intractable econometric problems? What are the statistical consequences of changes made for numerical reasons? How does one do inference on the output of nonstandard optimization problems? At the heart of these new estimation and inference questions lies the need to design and understand estimators as the product of algorithms and optimization problem, not only the minim and and of objective functions.

**ECON 31703* Topics In Econometrics (Spring)
**Graduate course covering recent research on the field of econometrics.

**ECON 31800 Advanced Econometrics (Fall)
**No description available at this time

**MACS 30301 – Introduction to Bayesian Statistics (Fall)
**The goal of this course is to give students an overview of the theory and methods for data analyses using the Bayesian paradigm. Topics include: (1) foundations of Bayesian inference, (2) development of Bayesian models and prior choices (3) analytical and simulation techniques for posterior estimation (4) model choice and diagnostics (5) sensitivity analysis, (6) an introduction to Monte Carlo Markov Chain (MCMC) simulations (7) intro to commonly used Bayesian estimation packages (R/JAGS/Bugs) (8) application of Bayesian analysis in real world and Political Science problems.

**MACS 33002 – Introduction to Machine Learning (Winter)
**This course will introduce students to the foundations of machine learning. Building on a mathematical foundation, we will cover everything needed for getting up and running with any computational research project from a machine learning perspective, including: the basics and mechanics of a model, sampling methods, training, testing and tuning, comparing supervised vs. unsupervised learning, regularization techniques, decision trees, neural networks (artificial and convolutional), and various other models and algorithms contributing to a solid foundation of machine learning.

Prerequisites: prior statistical training (through regression, though ideally MLE/GLM); statistical computing (at least basic proficiency in R).

**MACS 40200 – Structural Estimation (Winter)
**Structural estimation refers to the estimation of model parameters by taking a theoretical model directly to the data. (This is in contrast to reduced form estimation, which often entails estimating a linear model that is either explicitly or implicitly a simplified, linear version of a related theoretical model). This class will survey a range of structural models, then teach students estimation approaches including the generalized method of moments approach and maximum likelihood estimation. We will then examine the strengths and weaknesses of both approaches in a series of examples from the fields of economics, political science, and sociology. We will also learn the simulated method of moments approach. We will explore applications across the social sciences.

**MACS 40800 – Unsupervised Machine Learning (Fall)
**Though armed with rich datasets, many researchers are confronted with a lack of understanding of the

*structure*of their data. Unsupervised machine learning offers researchers a suite of computational tools for uncovering the underlying, non-random structure that is assumed to exist in feature space. This course will cover prominent unsupervised machine learning techniques such as clustering, item response theory (IRT) models, multidimensional scaling, factor analysis, and other dimension reduction techniques. Further, mechanics involved in unsupervised machine learning will also be covered, such as diagnosing clusterability of a feature space (visually and mathematically), measures of distance and distance matrices, different algorithms based on data size (k-medoids/k-means vs. PAM vs. CLARA), visualizing patterns, and methods of validation (e.g., internal vs. external validation).

**MACS 60000 – Computational Content Analysis (Winter)
**A vast expanse of information about what people do, know, think, and feel lies embedded in text, and more of the contemporary social world lives natively within electronic text than ever before. These textual traces range from collective activity on the web, social media, instant messaging and automatically transcribed YouTube videos to online transactions, medical records, digitized libraries and government intelligence. This supply of text has elicited demand for natural language processing and machine learning tools to filter, search, and translate text into valuable data. The course will survey and practically apply many of the most exciting computational approaches to text analysis, highlighting both supervised methods that extend old theories to new data and unsupervised techniques that discover hidden regularities worth theorizing. These will be examined and evaluated on their own merits, and relative to the validity and reliability concerns of classical content analysis, the interpretive concerns of qualitative content analysis, and the interactional concerns of conversation analysis. We will also consider how these approaches can be adapted to content beyond text, including audio, images, and video. We will simultaneously review recent research that uses these approaches to develop social insight by exploring (a) collective attention and reasoning through the content of communication; (b) social relationships through the process of communication; and (c) social states, roles, and moves identified through heterogeneous signals within communication. The course is structured around gaining understanding and experimenting with text analytical tools, deploying those tools and interpreting their output in the context of individual research projects, and assessment of contemporary research within this domain. Class discussion and assignments will focus on how to use, interpret, and combine computational techniques in the context of compelling social science research investigations.

**MAPS 31701 Data Analytics & Statistics (Fall 2019)
**This course is designed for graduate students and advanced undergraduate students and aims to provide a strong foundation in the statistical and data analyses commonly used in the behavioral and social sciences. Topics include logistic regression, statistical inference, chi-square, analysis of variance, and repeated measures models. In addition, this course also place greater emphasis on developing practical skills, including the ability to conduct common analyses using statistical software. You will learn how to build models to investigate your data, formulate hypothesis tests as comparisons between statistical models and critically evaluate model assumptions. The goal of the course is for students to be able to define and use descriptive and inferential statistics to analyze and interpret statistical findings. Open only for Graduate students and 3rd and 4th year undergraduates. Undergraduates must have instructor consent.

**MAPS 31702 Data Science (Winter 2020)
**This course is a graduate-level methods class that aims to train you to solve real-world statistical problems. The goal of the course is for students to be able to choose an appropriate statistical method to solve a given problem of data analysis and communicate your results clearly and succinctly. There will be an extensive hands-on experience of analysis of real data through practical classes.

**MAPS 31760. Concepts, Assumptions, Data and Inference in Quantitative Research Methodology (Fall)
**The main purpose for designing this course is to provide instruction on core principles of quantitative research methodology in the social sciences. This course will equip graduate students with the conceptual tools of quantitative research that form the foundation for data management, data analysis and inference. We will examine a series of topics related to measurement, sampling, hypothesis, data structure and model interpretation which scholars would encounter when designing any project that uses quantitative data for empirical research. My main target audience is graduate students enrolled in the Masters Program in Social Sciences that want to use quantitative research techniques for their MS thesis project. Students enrolled in this course are expected to have taken at least one upper-level undergraduate or a graduate course in statistics in the past two years or equivalent. Students who are not planning to use quantitative methods in the future can also enroll in this course to develop proficiency in reading research publications and scholarly reports that use quantitative tools.

**PBHS 31001 Epidemiologic Methods (Winter)**

This course expands on the material presented in “Clinical Epidemiology/Epidemiology and Population Health,” further exploring issues in the conduct of epidemiologic studies. The student will learn the application of both stratified and multivariate methods to the analysis of epidemiologic data. The final project will be to write the “specific aims” and “methods” sections of a research proposal on a topic of the student’s choice.

PQ: PBHS 30700 or PBHS 30910 and PBHS 32400/STAT 22400, applied statistics courses through multivariate regression or consent of instructor ID: STAT 35700

**PBHS 33300/STAT 36900 Applied Longitudinal Data Analysis. 100 Units. (Spring)
**Longitudinal data consist of multiple measures over time on a sample of individuals. This type of data occurs extensively in both observational and experimental biomedical and public health studies, as well as in studies in sociology and applied economics. This course will provide an introduction to the principles and methods for the analysis of longitudinal data. Whereas some supporting statistical theory will be given, emphasis will be on data analysis and interpretation of models for longitudinal data. Problems will be motivated by applications in epidemiology, clinical medicine, health services research, and disease natural history studies.

Prerequisite(s): PBHS 32400/STAT 22400 or equivalent, and PBHS 32600/STAT 22600 or PBHS 32700/STAT 22700 or equivalent; or consent of instructor. Equivalent Course(s): PBHS 33300

**PBHS 33400 Multilevel Modeling (Fall) **

This course will focus on the analysis of multilevel data in which subjects are nested within clusters (e.g., health care providers, hospitals). The focus will be on clustered data, and several extensions to the basic two-level multilevel model will be considered including three-level, cross-classified, multiple membership, and multivariate models. In addition to models for continuous outcomes, methods for non-normal outcomes will be covered, including multilevel models for dichotomous, ordinal, nominal, time-to-event, and count outcomes. Some statistical theory will be given, but the focus will be on application and interpretation of the statistical analyses.

PQ: PBHS 32400 and PBHS 32700 or consent of instructor.

**PBHS 33500 Statistical Applications (Fall)**

This course provides a transition between statistical theory and practice. The course will cover statistical applications in medicine, mental health, environmental science, analytical chemistry, and public policy. Lectures are oriented around specific examples from a variety of content areas. Opportunities for the class to work on interesting applied problems presented by U of C faculty will be provided. Although an overview of relevant statistical theory will be presented, emphasis is on the development of statistical solutions to interesting applied problems.

PQ: PBHS 32400/STAT 22400 or equivalent, and PBHS 32600/STAT 22600, or PBHS 32700/STAT 22700 or equivalent; or consent of instructor. ID: STAT 35800

** PBHS 40500** **Advanced Epidemiologic Methods (Spring) **

This course examines some features of study design, but is primarily focused on analytic issues encountered in epidemiologic research. The objective of this course is to enable students to conduct thoughtful analysis of epidemiologic and other population research data. Concepts and methods that will be covered include: matching, sampling, conditional logistic regression, survival analysis, ordinal and polytomous logistic regressions, multiple imputation, and screening and diagnostic test evaluation. The course follows in sequence the material presented in “Epidemiologic Methods.”

PQ: PBHS 31001

**PLSC 30700. Introduction to Linear Models. 100 Units. (Winter)**

This course will provide an introduction to the linear model, the dominant form of statistical inference in the social sciences. The goals of the course are to teach students the statistical methods needed to pursue independent large-n research projects and to develop the skills necessary to pursue further methods training in the social sciences. Part I of the course reviews the simple linear model (as seen in STAT 22000 or its equivalent) with attention to the theory of statistical inference and the derivation of estimators. Basic calculus and linear algebra will be introduced. Part II extends the linear model to the multivariate case. Emphasis will be placed on model selection and specification. Part III examines the consequences of data that is “poorly behaved” and how to cope with the problem. Depending on time, Part IV will introduce special topics like systems of simultaneous equations, logit and probit models, time-series methods, etc. Little prior knowledge of math or statistics is expected, but students are expected to work hard to develop the tools introduced in class.

**PLSC 43100. Maximum Likelihood. 100 Units. (Fall)**

The purpose of this course is to familiarize students with the estimation and interpretation of maximum likelihood, a statistical method which permits a close linkage of deductive theory and empirical estimation. Among the problems considered in this course include: models of dichotomous choice, such as turnout and vote choice; models of limited categorical data, such as those for multi-party elections and survey responses; models for counts of uncorrelated events, such as executive orders and bookburnings; models for duration, such as the length of parliamentary coalitions or the tenure of bureaucracies; models for compositional data, such as allocation of time by bureaucrats to task and district vote shares; and models for latent variables, such as for predispositions. The emphasis in this course will be on the extraction of information about political and social phenomena, not upon properties of estimators. Prerequisite(s): PLSC 30700 Intro to Linear Models or consent of instructor.

**PLSC 43401 Mathematical Foundations of Political Methodology (Fall)
**This is a first course on the theory and practice of mathematical methods in social science research. These mathematical and computer skills are needed for the quantitative and formal modeling courses offered in the political science department, and are increasingly necessary for courses in American Politics, Comparative Politics, and International Relations. We will cover mathematical techniques (linear algebra, calculus, probability) and methods of logical and statistical inference (proofs and statistics).

**PLSC 57200. Network Analysis. 100 Units. (Fall)**

This seminar explores the sociological utility of the network as a unit of analysis. How do the patterns of social ties in which individuals are embedded differentially affect their ability to cope with crises, their decisions to move or change jobs, their eagerness to adopt new attitudes and behaviors? The seminar group will consider (a) how the network differs from other units of analysis, (b) structural properties of networks, consequences of flows (or content) in network ties, and (c) dynamics of those ties. Equivalent Course(s): SOCI 50096

**PPHA 31002 Statistics for Data Analysis I (Fall)
**This is the first quarter of the statistics sequence at the Harris School. This course aims to provide students with a basic understanding of statistical analysis for policy research. This course makes no assumptions about prior knowledge, apart from basic mathematics skills. Examples will draw on current events and policy debates when possible.

**PPHA 31202, Advanced Statistics for Data Analysis I (Fall)
**This course focuses on the statistical concepts and tools used to study the association between variables and causal inference. This course will introduce students to regression analysis and explore its uses in policy analyses. This course will assume a greater statistical sophistication on the part of students than is assumed in PPHA 31002.

**PPHA 34600, Program Evaluation (Fall/Winter/Spring)
**The goal of this course is to introduce students to program evaluation and provide an overview of current issues and methods in impact evaluation. We will focus on estimating the causal impacts of programs and policy using social experiments, panel data methods, instrumental variables, regression discontinuity designs, and matching techniques. We will discuss applications and examples from the fields of education, demography, health, crime, job training, and others. Prerequisites: PPHA 31001 or PPHA 31002 and PPHA 31101 or PPHA 31102 or equivalent statistics coursework.

**PPHA 41400, Applied Regression Analysis: Analysis of Microeconomic Data (Spring)
**This course is based on the theory and practice of econometrics. Its intention is to provide hands-on experience with econometric analysis, without neglecting sound knowledge of econometric theory. It is designed to help students acquire skills that make them effective consumers and producers of empirical research in public policy, economics and related fields. Throughout the course, concepts will be illustrated with application in economics. Various aspects will be covered in the course, in particular: i) development of testable econometric models; ii) use of appropriate data, and; iii) specification and estimation of econometric models.

**PPHA 41420, Multilevel Regression Modeling for Public Policy (Winter)**

Grouped data, such as students within schools or workers within firms, are ubiquitous in public policy. Both to satisfy the assumptions of regression and to build realistic models that provide realistic inferences we should include group-level intercepts and slopes in our models. Traditionally this was accomplished using fixed effects and their interactions with covariates. However, as we commonly have few observations per group, this approach can yield noisy or degenerate estimates. We will introduce a Bayesian perspective on regression modeling and use it to develop multilevel regression models (also known as hierarchical or mixed-effects models). Under certain assumptions, these models allow us to partially pool information across groups in order to efficiently model the group structure even when the number of observations within each group is small. Recent advances in computing have made the estimation of multilevel models much more practical. Drawing on examples from the fields of epidemiology, education, and political science, we will study applications of multilevel models to heterogeneous treatment effects, small area estimation, longitudinal data, and prediction. Familiarity with R and linear regression are assumed.

** ****PPHA 41600, Survey Research Methodology (Winter)
**The goal of this course is to learn about the methods used to collect publicly available survey data that can be used for policy research so that students can appropriately use these data to answer policy relevant questions. Students will learn about the methods used to collect survey data, how to develop researchable policy questions that can be answered with the survey data, and about the limitations of the survey data for answering policy research questions. In order to analyze policy questions using available survey data, students will also learn about actual survey instruments, survey sample designs, survey data processing, and survey data systems that the major public policy relevant surveys use. The course will also examine specific measurement and analysis issues that are of interest to policy research (e.g., measuring public program enrollment and public program eligibility simulation). By the end of the course each student will understand the methods used to collect survey data, have developed a researchable policy question, carried out the appropriate analysis to answer the question, produced high quality analytical tables, and written up descriptions of the methods used to produce the numbers in the tables in a style that is consistent with professional policy research.

**PPHA 42000, Applied Econometrics I (PhD Level) (Winter) **

This course is the first in a two-part sequence designed to cover applied econometrics and regression methods at a fairly advanced level. This course provides a theoretical analysis of linear regression models for applied researchers. It considers analytical issues caused by violations of the Gauss-Markov assumptions, including linearity (functional form), heteroscedasticity, and panel data. Alternative estimators are examined to deal with each. Prerequisites: This course is intended for first or second-year Ph.D. students or advanced masters-level students who have taken the Statistics 24400/24500 sequence. Familiarity with matrix algebra is necessary.

**PSYC 36211. Mathematical Methods for Biological Sciences II. (Winter)
**This course is a continuation of BIOS 26210. The topics start with optimization problems, such as nonlinear least squares fitting, principal component analysis and sequence alignment. Stochastic models are introduced, such as Markov chains, birth-death processes, and diffusion processes, with applications including hidden Markov models, tumor population modeling, and networks of chemical reactions. In computer labs, students learn optimization methods and stochastic algorithms, e.g., Markov Chain, Monte Carlo, and Gillespie algorithm. Students complete an independent project on a topic of their interest. Prerequisite(s): BIOS 26210 Equivalent.

**SOCI 30004 Statistical Methods of Research 1. (Winter)
**This course provides a comprehensive introduction to widely used quantitative methods in sociology and related social sciences. Topics covered include analysis of variance and multiple regression, considered as they are used by practicing social scientists.

**SOCI 30005 Statistical Methods of Research 2**. **(Spring)** Social scientists regularly ask questions that can be answered with quantitative data from a population-based sample. For example, how much more income do college graduates earn compared to those who do not attend college? Do men and women with similar levels of training and who work in similar jobs earn different incomes? Why do children who grow up in different family or neighborhood environments perform differently in school? To what extent do individuals from different socioeconomic backgrounds hold different types of political attitudes and engage in different types of political behavior? This course explores statistical methods that can be used to answer these and many other questions of interest to social scientists. The main objectives are to provide students with a firm understanding of linear regression and generalized linear models and with the technical skills to implement these methods in practice.

**SOCI 30112 Application of Hierarchical Linear Models. (Spring)
** A number of diverse methodological problems such as correlates of change, analysis of multi-level data, and certain aspects of meta-analysis share a common feature–a hierarchical structure. The hierarchical linear model offers a promising approach to analyzing data in these situations. This course will survey the methodological literature in this area, and demonstrate how the hierarchical linear model can be applied to a range of problems.

**SOCI 30253 /MACS 54000 Introduction to Spatial Data Science. (Fall)
** Spatial data science consists of a collection of concepts and methods drawn from both statistics and computer science that deal with accessing, manipulating, visualizing, exploring and reasoning about geographical data. The course introduces the types of spatial data relevant in social science inquiry and reviews a range of methods to explore these data. Topics covered include formal spatial data structures, geovisualization and visual analytics, rate smoothing, spatial autocorrelation, cluster detection and spatial data mining. An important aspect of the course is to learn and apply open source software tools, including R and GeoDa.

**SOCI 40103 Event History Analysis. (Spring)
** An introduction to the methods of event history analysis will be given. The methods allow for the analysis of duration data. Non-parametric methods and parametric regression models are available to investigate the influence of covariates on the duration until a certain even occurs. Applications of these methods will be discussed i.e., duration until marriage, social mobility processes organizational mortality, firm tenure, etc.

** ****SOCI 40217/GEOG 40217/MACS 55000 Spatial Regression Analysis. (Spring)
**This course covers statistical and econometric methods specifically geared to the problems of spatial dependence and spatial heterogeneity in cross-sectional data. The main objective of the course is to gain insight into the scope of spatial regression methods, to be able to apply them in an empirical setting, and to properly interpret the results of spatial regression analysis. While the focus is on spatial aspects, the types of methods covered have general validity in statistical practice. The course covers the specification of spatial regression models in order to incorporate spatial dependence and spatial heterogeneity, as well as different estimation methods and specification tests to detect the presence of spatial autocorrelation and spatial heterogeneity. Special attention is paid to the application to spatial models of generic statistical paradigms, such as Maximum Likelihood, Generalized Methods of Moments and the Bayesian perspective. An important aspect of the course is the application of open source software tools such as R, GeoDa and PySal to solve empirical problems.

**SOCI 50123 Seminar: Elegant Models for Social Structure, Probability and Non-Probability Applications. (Fall)
**We investigate attempts to use relational data to build mathematically compelling models of social structure. Beginning with Harrison White’s mathematization of Levi-Strauss, we investigate role algebras, before turning to probabilistic models. We examine attempts to specify a null distribution for network graphs, and then ways of linking observed graph statistics to models of structure. We then examine the idea of Markov graphs, relying on Besag, and then the application to networks. At this point, we shift to an exploration of the practical applications of different means of looking at probability models with structural and nonstructural covariates, relying on example data sets and simulation, to compare the capacity of pseudo-likelihood and MCMC maximum likelihood methods to produce correct answers to realistic questions, for conventional network data, for multinetwork data, and for temporal data. This last part will be a learning experience for all of us.

**SOSC 36006: Foundations of Statistical Theory (Fall)
**This course is designed for graduate and advanced undergraduate students who aim to develop conceptual understanding of the fundamentals of statistical theory underlying a wide array of quantitative research methods. The course introduces students to probability and statistical theory and emphasizes the connection between statistical theory and the routine practice of statistical applications in quantitative research. Students will gain basic understanding of the concepts of joint, marginal, and conditional probability, Bayes rule, probability distributions of random variables, principles of statistical inference, sampling distributions, and estimation strategies. The course can serve as a preparation for mathematical statistics courses such as STAT 244 (Statistical Theory and Methods 1) and as a theoretical foundation for various advanced quantitative methods courses in the social, behavioral, and health sciences. Prereq: Basic knowledge of linear algebra and calculus, and specifically differentiation and integration, is necessary to understand the material on continuous distributions, multivariate distributions and functions of random variables.

**SOSC 36007: Overview of Quantitative Methods in the Social and Behavioral Sciences (Winter)
**The course is designed to offer an overview of and present the common logic underlying a wide range of methods developed for rigorous quantitative inquiry in the social and behavioral sciences. Students will become familiar with various research designs, measurement, and advanced analytic strategies broadly applicable to theory-driven and data-informed quantitative research in many disciplines. Moreover, they will understand the inherent connections between different statistical methods, and will become aware of the strengths and limitations of each. In addition, this course will provide a gateway to the numerous offerings of advanced quantitative methods courses. It is suitable for undergraduate and graduate students at any stage of their respective programs.

Prereq: Introductory level statistics

**SOSC 36008: Principles of Measurement (Spring)
**Accurate measurement of key theoretical constructs with known and consistent psychometric properties is one of the essential steps in quantitative social and behavioral research. However, measurement of phenomena that are not directly observable (such as psychological attributes, perceptions of organizational climate, or quality of services) is difficult. Much of the research in psychometrics has been developed in an attempt to properly define and quantify such phenomena. This course is designed to introduce students to the relevant concepts, principles, and methods underlying the construction and interpretation of tests or measures. It provides in-depth coverage of test reliability and validity, topics in test theory, and statistical procedures applicable to psychometric methods. Such understanding is essential for rigorous practice in measurement as well as for proper interpretation of research. The course is highly recommended for students who plan to pursue careers in academic research or applied practice involving the use or development of tests or measures in the social and behavioral sciences.

Prereq: Couse work or background experience in statistics through inferential statistics and linear regression.

**SOSC 36009 Introductory Statistical Methods and Applications (Fall)
**This course introduces and applies fundamental statistical concepts, principles, and procedures to the analysis of data in social and behavioral sciences. Students will learn computation, interpretation, and application of commonly used descriptive, correlational, and inferential statistical procedures as they relate to social and behavioral research. The course will integrate the use of Stata as a tool for these techniques. This course is equivalent to SOSC 20004/30004 (Statistical Methods of Research I), CHDV 20101/30101 (Applied Statistics in Human Development Research), PSYCH 20100 (Psychological Statistics), and other introductory level applied statistics courses.

**STAT 22000. Statistical Methods and Applications. 100 Units. (Fall/Winter/Spring)
**This course introduces statistical techniques and methods of data analysis, including the use of statistical software. Examples are drawn from the biological, physical, and social sciences. Students are required to apply the techniques discussed to data drawn from actual research. Topics include data description, graphical techniques, exploratory data analyses, random variation and sampling, basic probability, random variables and expected values, confidence intervals and significance tests for one- and two-sample problems for means and proportions, chi-square tests, linear regression, and, if time permits, analysis of variance.

Prerequisite(s): MATH 13100 or 15100 or 15200 or 15300 or 16100 or 16110 or 15910 or 19520 or 19620 or 20250 or 20300 or 20310.

Note(s): Students may count either STAT 22000 or STAT 23400, but not both, toward the forty-two credits required for graduation. Students with credit for STAT 23400 not admitted. This course meets on of the general education requirements in the mathematical sciences. Only one of STAT 20000, STAT 20010, or STAT 22000, can count toward the general education requirement in the mathematical sciences.

**STAT 22200. Linear Models and Experimental Design. 100 Units. (Spring)
**This course covers principles and techniques for the analysis of experimental data and the planning of the statistical aspects of experiments. Topics include linear models; analysis of variance; randomization, blocking, and factorial designs; confounding; and incorporation of covariate information.

Prerequisite(s): STAT 22000 or 23400 with a grade of at least C+, or STAT 22400 or 22600 or 24500 or 24510 or PBHS 32100, or AP Statistics credit for STAT 22000. Also two quarters of calculus (MATH 13200 or 15200 or 15300 or 16200 or 16210 or 15910 or 19520 or 19620 or 20250 or 20300 or 20310).

**STAT 22400/PBHS 32400 Applied Regression Analysis. 100 Units. (Fall/Spring)
**This course introduces the methods and applications of fitting and interpreting multiple regression models. The primary emphasis is on the method of least squares and its many varieties. Topics include the examination of residuals, the transformation of data, strategies and criteria for the selection of a regression equation, the use of dummy variables, tests of fit, nonlinear models, biases due to excluded variables and measurement error, and the use and interpretation of computer package regression programs. The techniques discussed are illustrated by many real examples involving data from both the natural and social sciences. Matrix notation is introduced as needed.

Prerequisite: PBHS 32100.

Equivalent Course(s): PBHS 32400 a grade of at least C, or STAT 22200 or 22600 or 24500 or 24510 or PBHS 32100, or AP Statistics credit for STAT 22000. Also two quarters of calculus (MATH 13200 or 15200 or 15300 or 16200 or 16210 or 15910 or 19520 or 19620 or 20250 or 20300 or 20310).

Equivalent Course(s): PBHS 32400

**STAT 22600/PBHS 32600 Analysis of Categorical Data. 100 Units. (Winter)
**This course covers statistical methods for the analysis of qualitative and counted data. Topics include description and inference for binomial and multinomial data using proportions and odds ratios; multi-way contingency tables; generalized linear models for discrete data; logistic regression for binary responses; multi-category logit models for nominal and ordinal responses; loglinear models for counted data; and inference for matched-pairs and correlated data. Applications and interpretations of statistical models are emphasized.

Prerequisite(s): STAT 22000 or 23400 with a grade of at least C+, or STAT 22200 or 22400 or 24500 or 24510 or PBHS 32100, or AP Statistics credit for STAT 22000. Also two quarters of calculus (MATH 13200 or 15200 or 15300 or 16200 or 16210 or 15910 or 19520 or 19620 or 20250 or 20300 or 20310).

Equivalent Course(s): PBHS 32600

**STAT 22700/PBHS 32700 Biostatistical Methods. 100 Units. (Winter)
**This course is designed to provide students with tools for analyzing categorical, count, and time-to-event data frequently encountered in medicine, public health, and related biological and social sciences. This course emphasizes application of the methodology rather than statistical theory (e.g., recognition of the appropriate methods; interpretation and presentation of results). Methods covered include contingency table analysis, Kaplan-Meier survival analysis, Cox proportional-hazards survival analysis, logistic regression, and Poisson regression.

Prerequisite(s): PBHS 32400, STAT 22400 or STAT 24500 or equivalent or consent of instructor.

Equivalent Course(s): PBHS 32700

**STAT 23400. Statistical Models and Methods. 100 Units. (Fall/Winter/Spring)
**This course is recommended for students throughout the natural and social sciences who want a broad background in statistical methodology and exposure to probability models and the statistical concepts underlying the methodology. Probability is developed for the purpose of modeling outcomes of random phenomena. Random variables and their expectations are studied; including means and variances of linear combinations and an introduction to conditional expectation. Binomial, Poisson, normal and other standard probability distributions are considered. Some probability models are studied mathematically, and others are studied via computer simulation. Sampling distributions and related statistical methods are explored mathematically, studied via simulation, and illustrated on data. Methods include, but are not limited to, inference for means and proportions for one- and two-sample problems, two-way tables, correlation, and simple linear regression. Graphical and numerical data description are used for exploration, communication of results, and comparing mathematical consequences of probability models and data. Mathematics employed is to the level of single-variable differential and integral calculus and sequences and series.

Prerequisite(s): MATH 13300 or 15300 or 16200 or 16210 or 15910 or 19520 or 19620 or 20250 or 20300 or 20310.

Note(s): Students may count either STAT 22000 or STAT 23400, but not both, toward the forty-two credits required for graduation. Students with AP Statistics credit for STAT 22000 will forego that credit by completing STAT 23400.

**STAT 24400/STAT 30030 Statistical Theory and Methods I. 100 Units. (Fall/Winter)
**This course is the first quarter of a two-quarter systematic introduction to the principles and techniques of statistics, as well as to practical considerations in the analysis of data, with emphasis on the analysis of experimental data. This course covers tools from probability and the elements of statistical theory. Topics include the definitions of probability and random variables, binomial and other discrete probability distributions, normal and other continuous probability distributions, joint probability distributions and the transformation of random variables, principles of inference (including Bayesian inference), maximum likelihood estimation, hypothesis testing and confidence intervals, likelihood ratio tests, multinomial distributions, and chi-square tests. Examples are drawn from the social, physical, and biological sciences. The coverage of topics in probability is limited and brief, so students who have taken a course in probability find reinforcement rather than redundancy. Students who have already taken STAT 25100 have the option to take STAT 24410 (if offered) instead of STAT 24400.

Prerequisite(s): (MATH 19520 or MATH 20000 with a grade of B or better), or MATH 16300 or 16310 or 20250 or 20300 or 20310 or 20700 or STAT 24300 or PHYS 22100.

Note(s): Some previous experience with statistics and/or probability helpful but not required. Concurrent or prior linear algebra (MATH 19620 or 20250 or STAT 24300 or equivalent) is recommended for students continuing to STAT 24500. Students may count either STAT 24400 or STAT 24410, but not both, toward the forty-two credits required for graduation.

**STAT 24500/STAT 30040 Statistical Theory and Methods II. 100 Units. (Winter/Spring)
**This course is the second quarter of a two-quarter systematic introduction to the principles and techniques of statistics, as well as to practical considerations in the analysis of data, with emphasis on the analysis of experimental data. This course continues from either STAT 24400 or STAT 24410 and covers statistical methodology, including the analysis of variance, regression, correlation, and some multivariate analysis. Some principles of data analysis are introduced, and an attempt is made to present the analysis of variance and regression in a unified framework. Statistical software is used.

Prerequisite(s): Linear algebra (MATH 19620 or 20250 or STAT 24300 or PHYS 22100 or equivalent) and (STAT 24400 or STAT 24410).

Note(s): Students may count either STAT 24500 or STAT 24510, but not both, toward the forty-two credits required for graduation.

**STAT 24410. Statistical Theory and Methods Ia. 100 Units. (Fall)
**This course is the first quarter of a two-quarter sequence providing a principled development of statistical methods, including practical considerations in applying these methods to the analysis of data. The course begins with a brief review of probability and some elementary stochastic processes, such as Poisson processes, that are relevant to statistical applications. The bulk of the quarter covers principles of statistical inference from both frequentist and Bayesian points of view. Specific topics include maximum likelihood estimation, posterior distributions, confidence and credible intervals, principles of hypothesis testing, likelihood ratio tests, multinomial distributions, and chi-square tests. Additional topics may include diagnostic plots, bootstrapping, a critical comparison of Bayesian and frequentist inference, and the role of conditioning in statistical inference. Examples are drawn from the social, physical, and biological sciences. The statistical software package R will be used to analyze datasets from these fields and instruction in the use of R is part of the course.

Prerequisite(s): STAT 25100 or STAT 25150 or MATH 23500.

Note(s): Some previous experience with statistics helpful but not required. Concurrent or prior linear algebra (MATH 19620 or 20250 or STAT 24300 or equivalent) is recommended for students continuing to STAT 24510. Students may count either STAT 24400 or STAT 24410, but not both, toward the forty-two credits required for graduation.

Equivalent Course(s): STAT 30030

**STAT 24510. Statistical Theory and Methods IIa. 100 Units. (Winter)
**This course is a continuation of STAT 24410. The focus is on theory and practice of linear models, including the analysis of variance, regression, correlation, and some multivariate analysis. Additional topics may include bootstrapping for regression models, nonparametric regression, and regression models with correlated errors.

Prerequisite(s): STAT 24410 and linear algebra (MATH 19620 or MATH 20250 or STAT 24300 or PHYS 22100 or equivalent).

Note(s): Students may count either STAT 24500 or STAT 24510, but not both, toward the forty-two credits required for graduation.

Equivalent Course(s): STAT 30040

**STAT 24620/32950 Multivariate Statistical Analysis: Applications and Techniques. 100 Units. (Spring)
**This course focuses on applications and techniques for analysis of multivariate and high dimensional data. Beginning subjects cover common multivariate techniques and dimension reduction, including principal component analysis, factor model, canonical correlation, multi-dimensional scaling, discriminant analysis, clustering, and correspondence analysis (if time permits). Further topics on statistical learning for high dimensional data and complex structures include penalized regression models (LASSO, ridge, elastic net), sparse PCA, independent component analysis, Gaussian mixture model, Expectation-Maximization methods, and random forest. Theoretical derivations will be presented with emphasis on motivations, applications, and hands-on data analysis.Prerequisite(s): (STAT 24300 or MATH 20250) and (STAT 24500 or STAT 24510). Graduate students in Statistics or Financial Mathematics can enroll without prerequisites.

Note(s): Linear algebra at the level of STAT 24300. Knowledge of probability and statistical estimation techniques (e.g. maximum likelihood and linear regression) at the level of STAT 24400-24500. Equivalent Course(s): STAT 32950

**STAT 25100. Introduction to Mathematical Probability. 100 Units. (Fall/Spring)
**This course covers fundamentals and axioms; combinatorial probability; conditional probability and independence; binomial, Poisson, and normal distributions; the law of large numbers and the central limit theorem; and random variables and generating functions.

Prerequisite(s): ((MATH 16300 or MATH 16310 or MATH 20500 or MATH 20510 or MATH 20900), with no grade requirement), or ((MATH 19520 or MATH 20000) with (either a minimum grade of B-, or STAT major, or currently enrolled in prerequisite course)). Or instructor consent.

Note(s): Students may count either STAT 25100 or STAT 25150, but not both, toward the forty-two credits required for graduation.

**STAT 25150. Introduction to Mathematical Probability-A. 100 Units. (Fall)
**This course covers fundamentals and axioms; combinatorial probability; conditional probability and independence; binomial, Poisson, and normal distributions; the law of large numbers and the central limit theorem; and random variables and generating functions.

Prerequisite(s): (MATH 16300 or MATH 16310 or MATH 20500 or MATH 20510, with a minimum grade of A-), or (MATH 20900 with no grade requirement), or consent of instructor.

Note(s): Students may count either STAT 25100 or STAT 25150, but not both, toward the forty-two credits required for graduation.

Equivalent Course(s): MATH 25150

**STAT 25300. Introduction to Probability Models. 100 Units. (Winter)
**This course introduces stochastic processes as models for a variety of phenomena in the physical and biological sciences. Following a brief review of basic concepts in probability, we introduce stochastic processes that are popular in applications in sciences (e.g., discrete time Markov chain, the Poisson process, continuous time Markov process, renewal process and Brownian motion).

Prerequisite(s): STAT 24400 or STAT 24410 or STAT 25100 or STAT 25150

Equivalent Course(s): STAT 31700

**STAT 26100. Time Dependent Data. 100 Units. (Fall)
**This course considers the modeling and analysis of data that are ordered in time. The main focus is on quantitative observations taken at evenly spaced intervals and includes both time-domain and spectral approaches.

Prerequisite(s): STAT 24500 w/B- or better or STAT 24510 w/C+ or better is required; alternatively STAT 22400 w/B- or better and exposure to multivariate calculus (MATH 16300 or MATH 16310 or MATH 19520 or MATH 20000 or MATH 20500 or MATH 20510 or MATH 20800). Graduate students in Statistics or Financial Mathematics can enroll without prerequisites. Some previous exposure to Fourier series is helpful but not required.

Equivalent Course(s): STAT 33600

** ****STAT 27400. Nonparametric Inference. 100 Units. (Fall)
**Nonparametric inference is about developing statistical methods and models that make weak assumptions. A typical nonparametric approach estimates a nonlinear function from an infinite dimensional space rather than a linear model from a finite dimensional space. This course gives an introduction to nonparametric inference, with a focus on density estimation, regression, confidence sets, orthogonal functions, random processes, and kernels. The course treats nonparametric methodology and its use, together with theory that explains the statistical properties of the methods.

Prerequisite(s): STAT 24400 or STAT 24410 w/B- or better is required; alternatively STAT 22400 w/B+ or better and exposure to multivariate calculus (MATH 16300 or MATH 16310 or MATH 19520 or MATH 20000 or MATH 20500 or MATH 20510 or MATH 20800) and linear algebra (MATH 19620 or MATH 20250 or STAT 24300 or equivalent). Master’s students in Statistics can enroll without prerequisites. Equivalent Course(s): STAT 37400

** ****STAT 27700/CMSC 25300 Mathematical Foundations of Machine Learning. 100 Units. (Fall)
**This course is an introduction to the mathematical foundations of machine learning that focuses on matrix methods and features real-world applications ranging from classification and clustering to denoising and data analysis. Mathematical topics covered include linear equations, regression, regularization, the singular value decomposition, and iterative algorithms. Machine learning topics include the lasso, support vector machines, kernel methods, clustering, dictionary learning, neural networks, and deep learning. Students are expected to have taken calculus and have exposure to numerical computing (e.g. Matlab, Python, Julia, R).

Prerequisite(s): CMSC 12200 or CMSC 15200 or CMSC 16200, and the equivalent of two quarters of calculus (MATH 13200 or higher).

Equivalent Course(s): CMSC 25300

**STAT 27850/30850 Multiple Testing, Modern Inference, and Replicability. 100 Units. (Winter)
**This course examines the problems of multiple testing and statistical inference from a modern point of view. High-dimensional data is now common in many applications across the biological, physical, and social sciences. With this increased capacity to generate and analyze data, classical statistical methods may no longer ensure the reliability or replicability of scientific discoveries. We will examine a range of modern methods that provide statistical inference tools in the context of modern large-scale data analysis. The course will have weekly assignments as well as a final project, both of which will include both theoretical and computational components.

Prerequisite(s): STAT 24400 or STAT 24410 or consent of instructor.

Equivalent Course(s): STAT 30850

**STAT 30100. Mathematical Statistics-1. 100 Units. (Winter)
**This course is part of a two-quarter sequence on the theory of statistics. Topics will include exponential, curved exponential, and location-scale families; mixtures, hierarchical, and conditional modeling including compatibility of conditional distributions; principles of estimation; identifiability, sufficiency, minimal sufficiency, ancillarity, completeness; properties of the likelihood function and likelihood-based inference, both univariate and multivariate, including examples in which the usual regularity conditions do not hold; elements of Bayesian inference and comparison with frequentist methods; and multivariate information inequality. Part of the course will be devoted to elementary asymptotic methods that are useful in the practice of statistics, including methods to derive asymptotic distributions of various estimators and test statistics, such as Pearson’s chi-square, standard and nonstandard asymptotics of maximum likelihood estimators and Bayesian estimators, asymptotics of order statistics and extreme order statistics, Cramer’s theorem including situations in which the second-order term is needed, and asymptotic efficiency. Other topics (e.g., methods for dependent observations) may be covered if time permits.

Prerequisite(s): STAT 30400 or consent of instructor

**STAT 30200. Mathematical Statistics-2. 100 Units. (Spring)
**This course continues the development of Mathematical Statistics, with an emphasis on hypothesis testing. Topics include comparison of Bayesian and frequentist hypothesis testing; admissibility of Bayes’ rules; confidence and credible sets; likelihood ratio tests and their asymptotics; Bayes factors; methods for assessing predictions for normal means; shrinkage and thresholding methods; sparsity; shrinkage as an example of empirical Bayes; multiple testing and false discovery rates; Bayesian approach to multiple testing; sparse linear regressions (subset selection and LASSO, proof of estimation errors for LASSO, Bayesian perspective of sparse regressions); and Bayesian model averaging.

Prerequisite(s): STAT 24500 or STAT 30100

**STAT 30400. Distribution Theory. 100 Units. (Fall)
**This course is a systematic introduction to random variables and probability distributions. Topics include standard distributions (i.e. uniform, normal, beta, gamma, F, t, Cauchy, Poisson, binomial, and hypergeometric); properties of the multivariate normal distribution and joint distributions of quadratic forms of multivariate normal; moments and cumulants; characteristic functions; exponential families; modes of convergence; central limit theorem; and other asymptotic approximations.

Prerequisite(s): STAT 24500 or STAT 24510 and MATH 20500 or MATH 20510, or consent of instructor.

**STAT 30600. Adv. Statistical Inference 1. 100 Units. (Quarter TBD) *may not be offered in 2019-2020
**Topics covered in this course will include: Gaussian distributions; conditional distributions; maximum likelihood and REML; Laplace approximation and associated expansion; combinatorics and the partition lattice; Mobius inversion; moments, cumulants symmetric functions, and $k$-statistics; cluster expansions; Bartlett identities and Bartlett adjustment; random partitions, partition processes, and CRP process; Gauss-Ewens cluster process; classification models; trees rooted and unrooted; exchangeable random trees; and Cox processes used for classification.

Prerequisite(s): Consent of instructor

**STAT 30750. Numerical Linear Algebra. 100 Units. (Fall)
**This course is devoted to the basic theory of linear algebra and its significant applications in scientific computing. The objective is to provide a working knowledge and hands-on experience of the subject suitable for graduate level work in statistics, econometrics, quantum mechanics, and numerical methods in scientific computing. Topics include Gaussian elimination, vector spaces, linear transformations and associated fundamental subspaces, orthogonality and projections, eigenvectors and eigenvalues, diagonalization of real symmetric and complex Hermitian matrices, the spectral theorem, and matrix decompositions (QR, Cholesky and Singular Value Decompositions). Systematic methods applicable in high dimensions and techniques commonly used in scientific computing are emphasized. Students enrolled in the graduate level STAT 30750 will have additional work in assignments, exams, and projects including applications of matrix algebra in statistics and numerical computations implemented in Matlab or R. Some programming exercises will appear as optional work for students enrolled in the undergraduate level STAT 24300.Prerequisite(s): Multivariate calculus (MATH 15910 or MATH 16300 or MATH 16310 or MATH 19520 or MATH 20000 or MATH 20500 or MATH 20510 or MATH 20900 or PHYS 22100 or equivalent). Previous exposure to linear algebra is helpful.

Equivalent Course(s): STAT 24300

**STAT 30800. Advanced Statistical Inference II. 100 Units. (Quarter TBD) *may not be offered in 2019-2020*
**This course will discuss the following topics in high-dimensional statistical inference: random matrix theory and asymptotics of its eigen-decompositions, estimation and inference of high-dimensional covariance matrices, large dimensional factor models, multiple testing and false discovery control and high-dimensional semiparametrics. On the methodological side, probability inequalities, including exponential, Nagaev, and Rosenthal-type inequalities will be introduced.

Prerequisite(s): STAT 30400, STAT 30100, and STAT 30210, or consent of instructor

**STAT 30810. High Dimensional Time Series Analysis. 100 Units. (Quarter TBD)
**This course will include lectures on the following topics: review of asymptotics for low dimensional time series analysis (linear and nonlinear processes; nonparametric methods; spectral and time domain approaches); covariance, precision, and spectral density matrix estimation for high dimensional time series; factor models; estimation of high dimensional vector autoregressive processes; prediction; and high dimensional central limit theorems under dependence.

**STAT 31150/CAAM 31150 Inverse Problems and Data Assimilation. 100 Units. (Fall)
**This class provides an introduction to Bayesian Inverse Problems and Data Assimilation, emphasizing the theoretical and algorithmic inter-relations between both subjects. We will study Gaussian approximations and optimization and sampling algorithms, including a variety of Kalman-based and particle filters as well as Markov chain Monte Carlo schemes designed for high-dimensional inverse problems.

Prerequisite(s): Familiarity with calculus, linear algebra, and probability/statistics at the level of STAT 24400 or STAT 24410. Some knowledge of ODEs may also be helpful. Equivalent Course(s): CAAM 31150

**STAT 31200. Introduction to Stochastic Processes I. 100 Units. (Fall)
**This course introduces stochastic processes not requiring measure theory. Topics include branching processes, recurrent events, renewal theory, random walks, Markov chains, Poisson, and birth-and-death processes.

Prerequisite(s): STAT 25100 and MATH 20500; STAT 30400 or consent of instructor

Note(s): Students with credit for MATH 235 should not enroll in STAT 312.

**STAT 32940/FINM 33180/CAAM 32940 Multivariate Data Analysis via Matrix Decompositions. 100 Units. (Fall)
**This course is about using matrix computations to infer useful information from observed data. One may view it as an “applied” version of Stat 30900 although it is not necessary to have taken Stat 30900; the only prerequisite for this course is basic linear algebra. The data analytic tools that we will study will go beyond linear and multiple regression and often fall under the heading of “Multivariate Analysis” in Statistics. These include factor analysis, correspondence analysis, principal components analysis, multidimensional scaling, linear discriminant analysis, canonical correlation analysis, cluster analysis, etc. Understanding these techniques require some facility with matrices in addition to some basic statistics, both of which the student will acquire during the course. Program elective. Equivalent Course(s): FINM 33180, CAAM 32940

**STAT 33100. Sample Surveys. 100 Units. (Fall)
**This course covers random sampling methods; stratification, cluster sampling, and ratio estimation; and methods for dealing with nonresponse and partial response.

Prerequisite(s): Consent of instructor

**STAT 33910/FINM 33170 Financial Statistics: Time Series, Forecasting, Mean Reversion, and High Frequency Data. 100 Units. (Winter)
**This course is an introduction to the econometric analysis of high-frequency financial data. This is where the stochastic models of quantitative finance meet the reality of how the process really evolves. The course is focused on the statistical theory of how to connect the two, but there will also be some data analysis. With some additional statistical background (which can be acquired after the course), the participants will be able to read articles in the area. The statistical theory is longitudinal, and it thus complements cross-sectional calibration methods (implied volatility, etc.). The course also discusses volatility clustering and market microstructure.

Prerequisite(s): STAT 39000/FINM 34500 (may be taken concurrently), also some statistics/econometrics background as in STAT 24400–24500, or FINM 33150 and FINM 33400, or equivalent, or consent of instructor.

Equivalent Course(s): FINM 33170

**STAT 34300. Applied Linear Stat Methods. 100 Units. (Fall)
**This course introduces the theory, methods, and applications of fitting and interpreting multiple regression models. Topics include the examination of residuals, the transformation of data, strategies and criteria for the selection of a regression equation, nonlinear models, biases due to excluded variables and measurement error, and the use and interpretation of computer package regression programs. The theoretical basis of the methods, the relation to linear algebra, and the effects of violations of assumptions are studied. Techniques discussed are illustrated by examples involving both physical and social sciences data.

Prerequisite(s): Graduate student in Statistics or instructor consent.

Note(s): Students who need it should take Linear Algebra (STAT 24300 or equivalent) concurrently.

**STAT 34700. Generalized Linear Models. 100 Units. (Winter)
**This applied course covers factors, variates, contrasts, and interactions; exponential-family models (i.e., variance function); definition of a generalized linear model (i.e., link functions); specific examples of GLMs; logistic and probit regression; cumulative logistic models; log-linear models and contingency tables; inverse linear models; Quasi-likelihood and least squares; estimating functions; and partially linear models.

Prerequisite(s): STAT 34300 or consent of instructor

**STAT 35920. Applied Bayesian Modeling and Inference. 100 Units. (Quarter TBD)
**Course begins with basic probability and distribution theory, and covers a wide range of topics related to Bayesian modeling, computation, and inference. Significant amount of effort will be directed to teaching students on how to build and apply hierarchical models and perform posterior inference. The first half of the course will be focused on basic theory, modeling, and computation using Markov chain Monte Carlo methods, and the second half of the course will be about advanced models and applications. Computation and application will be emphasized so that students will be able to solve real-world problems with Bayesian techniques.

Prerequisite(s): STAT 24400 and STAT 24500 or master level training in statistics.

Equivalent Course(s): PBHS 43010

**STAT 36900/PBHS 33300 Applied Longitudinal Data Analysis. 100 Units. (Spring)
**Longitudinal data consist of multiple measures over time on a sample of individuals. This type of data occurs extensively in both observational and experimental biomedical and public health studies, as well as in studies in sociology and applied economics. This course will provide an introduction to the principles and methods for the analysis of longitudinal data. Whereas some supporting statistical theory will be given, emphasis will be on data analysis and interpretation of models for longitudinal data. Problems will be motivated by applications in epidemiology, clinical medicine, health services research, and disease natural history studies.

Prerequisite(s): PBHS 32400/STAT 22400 or equivalent, and PBHS 32600/STAT 22600 or PBHS 32700/STAT 22700 or equivalent; or consent of instructor. Equivalent Course(s): PBHS 33300

**STAT 37601/CMSC 25025 Machine Learning and Large-Scale Data Analysis. 100 Units. (Spring)
**This course is an introduction to machine learning and the analysis of large data sets using distributed computation and storage infrastructure. Basic machine learning methodology and relevant statistical theory will be presented in lectures. Homework exercises will give students hands-on experience with the methods on different types of data. Methods include algorithms for clustering, binary classification, and hierarchical Bayesian modeling. Data types include images, archives of scientific articles, online ad clickthrough logs, and public records of the City of Chicago. Programming will be based on Python and R, but previous exposure to these languages is not assumed.

Prerequisite(s): CMSC 15400 or CMSC 12200 and STAT 22200 or STAT 23400, or by consent.

Note(s): The prerequisites are under review and may change. Equivalent Course(s): CMSC 25025

**STAT 37710/CAAM 37710/CMSC 35400 Machine Learning. 100 Units. (Spring)
**This course provides hands-on experience with a range of contemporary machine learning algorithms, as well as an introduction to the theoretical aspects of the subject. Topics covered include: the PAC framework, Bayesian learning, graphical models, clustering, dimensionality reduction, kernel methods including SVMs, matrix completion, neural networks, and an introduction to statistical learning theory.

Prerequisite(s): Consent of instructor Equivalent Course(s): CAAM 37710, CMSC 35400

**STAT 37790/CMSC 35425 Topics in Statistical Machine Learning. 100 Units. (Quarter TBD)
**Topics in Statistical Machine Learning” is a second graduate level course in machine learning, assuming students have had previous exposure to machine learning and statistical theory. The emphasis of the course is on statistical methodology, learning theory, and algorithms for large-scale, high dimensional data. The selection of topics is influenced by recent research results, and students can take the course in more than one quarter.

**STAT 41530 Topics in Causal Inference (Spring Quarter)
**We will start with a light and comparative introduction of two causal inference languages: the potential outcome model and the graphical representation of causal effects. In the course, we will discuss topics including confounding, instrumental variables (IV), mediation analysis, and effective treatment allocations, with their applications in genetics and epidemiological research.