MATH 448: Introduction to Statistical Learning and Data Mining

Prerequisites & Bulletin Description

Course Objectives

Upon completion of this course a student should be able to:

  • Obtain a strong conceptual understanding of statistical learning.
  • Learn the statistical principles behind many of the approaches to supervised & unsupervised learning.
  • Understand how to perform model selection & evaluation and to effectively communicate the results.
  • Learn how to rigorously analyze data using modern statistical methods and computer software.
  • Obtain hands-on experience by analyzing real data sets with the skills learned throughout the course.

Evaluation of Students

Students will be graded on written homework assignments, data analysis projects, midterm and final examinations.

Course Outline

The following timeline is approximate. 

Topics & Length
Topics Number of Weeks
Introduction to statistical leaning 1 week
Linear regression 1 week
Classifications 3 weeks
Methods for model evaluation, model selection and regularization 3 weeks
Nonparametric approaches: nearest neighbors, splines, generalized additive models and support vector machine 3 weeks
Ensemble methods: bagging, boosting and random forests 2 weeks
Unsupervised learning: dimensionality reduction and clustering 2 weeks

Textbooks & Software

An Introduction to Statistical Learning, with applications in R (2013) by G. James, D. Witten, T. Hastie, R. Tibshirani.
R by the R Development Core Team.

Submitted by: Tao He 
Date: May 2, 2016