Skip to content

Winter 2019: Computer Science 25300 / 35300, Mathematical Foundations of Machine Learning

Outline:

This course is an introduction to key mathematical concepts at the heart of machine learning. The focus is on matrix methods and statistical models and features real-world applications ranging from classification and clustering to denoising and recommender systems. Mathematical topics covered include linear equations, regression, regularization, the singular value decomposition, iterative optimization algorithms, and probabilistic models. Machine learning topics include the LASSO, support vector machines, kernel methods, clustering, dictionary learning, neural networks, and deep learning. Students are expected to have taken a course in calculus and have exposure to numerical computing (e.g. Matlab, Python, Julia, or R). Knowledge of linear algebra and statistics is not assumed.

Appropriate for graduate students or advanced undergraduates. This course could be used a precursor to TTIC 31020, “Introduction to Machine Learning” or CSMC 35400.

Logistics:

Class place and time: Mondays and Wednesdays, 3-4:15pm

Instructor: Rebecca Willett

E-mail: willett@uchicago.edu

Office: 321 Crerar

Office hours: TBD when classes are in session

Prerequisites: Students are expected to have taken a course in calculus and have exposure to numerical computing (e.g. Matlab, Python, Julia, or R).

Textbooks:

Evaluation:

Graduate and undergraduate students will be expected to perform at the graduate level and will be evaluated equally. All students will be evaluated by regular homework assignments, exams, and a final project. The final grade will be allocated to the different components as follows:

 

Homework: 30%. There are roughly weekly homework assignments (about 8 total). Homework problems include both mathematical derivations and proofs as well as more applied problems that involve writing code and working with real or synthetic data sets.

Exams: 40%. Two midterm exams (20% each). No final exam.

Final project: 30%. Students will work in groups (up to 3 students per group) to investigate a machine learning problem or technique using tools learned in class.

 

Letter grades will be assigned using the following hard cutoffs:

A: 93% or higher

A-: 90% or higher

B+: 87% or higher

B: 83% or higher

B-: 80% or higher

C+: 77% or higher

C: 60% or higher

D: 50% or higher

F: less than 50%

We reserve the right to curve the grades, but only in a fashion that would improve the grade earned by the stated rubric.

 

Tentative lecture schedule:

 

Weeks 1-2: Intro and Linear Models

What is ML, how is it related to other disciplines?

Learning goals and course objectives.
Vectors and matrices in machine learning models
Features and models

Least squares, linear independence and orthogonality

Linear classifiers

Loss, risk, generalization

Applications: bioinformatics, face recognition

 

Week 3: Singular Value Decomposition (Principal Component Analysis)

Dimensionality reduction
Applications: recommender systems, PageRank

 

Week 4: Overfitting and Regularization

Ridge regression

The Lasso and proximal point algorithms

Model selection, cross-validation

Applications: image deblurring, compressed sensing

 

Weeks 5-6: Beyond Least Squares: Alternate Loss Functions

Hinge loss

Logistic regression

Feature functions and nonlinear regression and classification

Kernel methods and support vector machines

Application: Handwritten digit classification

 

Week 7: Iterative Methods

Stochastic Gradient Descent (SGD)
Neural networks and backpropagation

 

Week 8: Statistical Models

Density estimation and maximum likelihood estimation

Gaussian mixture models and Expectation Maximization

Unsupervised learning and clustering
Application: text classification

 

Week 9: Ensemble Methods

AdaBoost

Decision trees

Random forests, bagging

Application: electronic health record analysis