Statistics for Research
Welcome!
This course was taught during a previous DSEER summer bootcamp and is now available for independent learners and teachers! The lessons are split into nine days, but feel free to go at the pace that is most comfortable for you.
Course description:
An introduction to basic statistical techniques used in environmental research and beyond, giving students a toolkit of methods to understand datasets in their own work: regression and inference, time series modeling, and Bayesian statistics.
DAY 1: Introduction
- Overview of formatting and syntax in R
- Reviewing how to use vectors, lists, matrices, arrays, factors, and data frames
- Using control flow via if else statements, for loop, while statements, functions and histograms
Day 2: Linear Regression
- Data exploration via general principles, apply functions and graphics
- Doing linear regression via normal distribution, OLS derivation and OLS assumptions
- Interpreting output, hypothesis testing and generating CIs
- Diagnostics and model building
DAY 3: Transformations + MT
- Adapting a linear model framework for special situations
- An overview of p-hacking and its dangers
- Analyzing r-squared and its efficiency
- Using bonferroni correction to determine significance level
DAY 4: Logistic Regression
- How and why do we use mathematical formulation in analyzing data
- What is logistic regression and how do we use it
- Overview of linear separability
- Machine learning applications of the lessons
Day 5: LASSO and Bootstrapping
R Notebook PDF
- Focus placed on LASSO linear regression
- The use of bootstrapping in modern statistics to improve confidence intervals and as a diagnostic tool for assumptions
Day 6: Intro to Bayesian Statistics
- Overview of Bayesian Statistics
- The intuition for Bayes Theorem
- The Bayes Factor
- Bayesian estimation
Day 7: Empirical Bayes
- Examples of Empirical Bayes
- Method of moments
- Maximum Likelihood Estimation (MLE) and using built-in functions to do this
- Bayesian Confidence Intervals & Posterior Credible Intervals
Day 8: Gaussian Processes
- Multivariate Normal
- An outline of Gaussian processes & their pros/cons
- Kriging/ making predictions
DAY 9: Conformal prediction
- Usual assumptions vs new ones
- Conformal Predictions
- Using conformal to create prediction sets for classification
Bonus Day! Day 10: Time Series
- What is a time series?
- How are time series special?
- What is stationarity, non-stationarity and autocorrelation?
- Checking stationarity with ACF plots
- Forecasting
Reference PDF
Cheat sheet including functions covered each in each day, their definitions, and additional resources for learners.
Resources for Instructors
Supplementary materials for instructors planning on teaching this bootcamp in their classrooms/departments/programs.
Feedback
See what previous generations of students have to say about our bootcamps and contribute your own feedback.