Statistics for Research

 

Welcome!

This course was taught during a previous DSEER summer bootcamp and is now available for independent learners and teachers! The lessons are split into nine days, but feel free to go at the pace that is most comfortable for you.

Course description:

An introduction to basic statistical techniques used in environmental research and beyond, giving students a toolkit of methods to understand datasets in their own work: regression and inference, time series modeling, and Bayesian statistics.

 

Click Here to download all the R Files for this course

DAY 1: Introduction

R Notebook PDF

  • Overview of formatting and syntax in R
  • Reviewing how to use vectors, lists, matrices, arrays, factors, and data frames
  • Using control flow via if else statements, for loop, while statements, functions and histograms

Day 2: Linear Regression

R Notebook PDF

  • Data exploration via general principles, apply functions and graphics
  • Doing linear regression via normal distribution, OLS derivation and OLS assumptions     
  • Interpreting output, hypothesis testing and generating CIs
  • Diagnostics and model building

DAY 3: Transformations + MT

R Notebook PDF

  • Adapting a linear model framework for special situations
  • An overview of p-hacking and its dangers
  • Analyzing r-squared and its efficiency
  • Using bonferroni correction to determine significance level

DAY 4: Logistic Regression

R Notebook PDF

  • How and why do we use mathematical formulation in analyzing data
  • What is logistic regression and how do we use it
  • Overview of linear separability
  • Machine learning applications of the lessons

Day 5: LASSO and Bootstrapping

R Notebook PDF

  • Focus placed on LASSO linear regression
  • The use of bootstrapping in modern statistics to improve confidence intervals and as a diagnostic tool for assumptions

Day 6: Intro to Bayesian Statistics

R Notebook PDF

  • Overview of Bayesian Statistics
  • The intuition for Bayes Theorem
  • The Bayes Factor
  • Bayesian estimation

Day 7: Empirical Bayes

R Notebook PDF

  • Examples of Empirical Bayes
  • Method of moments
  • Maximum Likelihood Estimation (MLE) and using built-in functions to do this
  • Bayesian Confidence Intervals & Posterior Credible Intervals

Day 8: Gaussian Processes

R Notebook PDF

  • Multivariate Normal
  • An outline of Gaussian processes & their pros/cons
  • Kriging/ making predictions

DAY 9: Conformal prediction

R Notebook PDF

  • Usual assumptions vs new ones
  • Conformal Predictions  
  • Using conformal to create prediction sets for classification

Bonus Day! Day 10: Time Series

R Notebook PDF

  • What is a time series?
  • How are time series special?    
  • What is stationarity, non-stationarity and autocorrelation?
  • Checking stationarity with ACF plots
  • Forecasting

Reference PDF

Cheat sheet including functions covered each in each day, their definitions, and additional resources for learners.

Resources for Instructors

Supplementary materials for instructors planning on teaching this bootcamp in their classrooms/departments/programs.

Feedback

See what previous generations of students have to say about our bootcamps and contribute your own feedback.