BST 260 Introduction to Data Science

Course Information

Lectures

Lecture slides, class notes, and problem sets are linked below. New material is added approximately on a weekly basis.

Dates Topic Slides Reading Instructor(s)
Sep 03 Productivity Tools Intro, Unix Installing R and RStudio on Windows or Mac, Getting Started, Unix Robert
Sep 08, Sep 10 Productivity Tools RStudio, Quarto, Git and GitHub RStudio Projects, Quarto, Git and GitHub Tutorial, Git and GitHub Book Reading Robert
Sep 15, Sep 18 R R basics, Vectorization R Basics, Vectorization Robert
Sep 22 R Tidyverse, ggplot2 dplyr, ggplot2 Robert
Sep 24 R Tyding data Reshaping Data Robert
Sep 29, Oct 01 Wrangling Intro, Data Importing, Dates and Times, Locales, Data APIs, Web scraping, Joining tables Importing data, dates and times, Locales, Joining Tables, Extracting data from the web Anthony
Oct 06, Oct 08 Data visualization Data Viz Principles, Distributions, Dataviz in practice Distributions, Dataviz Principles Anthony
Oct 15 Midterm 1 Covers material from Sep 03-Oct 08 Anthony
Oct 20 Probability Intro, Foundations for Inference Monte Carlo, Random Variables & CLT Anthony
Oct 22 Inference Intro, Parameter and estimates, Confidence Intervals Parameters & Estimates, Confidence Intervals Anthony
Oct 27, Oct 29 Statistical Models Models, Bayes, Hierarchical Models Data-driven Models, Bayesian Statistics, Hierarchical Models Anthony
Nov 03, Nov 05 Linear models Intro, Regression Regression, Multivariate Regression Robert
Nov 12, Nov 17 Linear models Multivariate regression, Treatment effect models Measurement Error Models, Treatment Effect Models, Association Tests, Association Not Causation Robert
Nov 19 High dimensional data Intro to Linear Algebra, Matrices in R Matrices in R, Applied Linear Algebra Robert
Nov 24 Midterm 2 Midterm 2: cover material from Sep 03-Nov 19 Robert
Dec 01 High dimensional data Distance, Dimension reduction Dimension Reduction Robert
Dec 03 Machine Learning Intro, Metrics, Conditionals, Smoothing Notation and terminology, Evaluation Metrics, conditional probabilities, smoothing Anthony
Dec 08, Dec 10 Machine Learning kNN, Resampling methods, caret package, Algorithms, ML in practice Resampling methods, ML algorithms, ML in practice Anthony
Dec 15, Dec 17 Other topics Association is not causation, Shiny, Shiny example code Association is not causation Robert, Anthony

Problem Sets

Problem set Topic Due Date Difficulty
Problem Set 1 Unix, Quarto Sep 12 easy
Problem Set 2 R Sep 18 medium
Problem Set 3 Tidyverse Sep 26 medium
Problem Set 4 Wrangling Oct 3 medium
Problem Set 5 Covid 19 data visualization Oct 10 medium
Problem Set 6 Probability Oct 24 easy
Problem Set 7 Predict the election Nov 03 hard
Problem Set 8 Excess mortality after Hurricane María Nov 14 medium
Problem Set 9 Matrices Nov 21 easy
Problem Set 10 Digit reading Dec 15 hard
Final Project Your choice Dec 19 hard

Office Hour Times

Meeting Time Location
Robert Gentleman Monday, 11:30 am to 12:30 pm TBD
Anthony Christidis Friday, 12:30 pm to 1:30 pm Zoom
Angela Wang Monday, 3:45 pm to 4:45 pm Kresge 204 (Except 11/3 which will be in FXB G13)
Ava Harrington Thursday, 2:00 pm to 3:00 pm Kresge 204
Emma Crenshaw Tuesday, 10:00 am to 11:00 am FXB G03 (Except 9/9 which will be in Kresge 201)
Jing Li Wednesday, 1:30 pm to 2:30 pm Kresge LL6

Acknowledgments

For the Fall 2025 iteration of BST 260, the course website was modified by Anthony Christidis, building on the Fall 2024 course template. We thank Maria Tackett and Mine Çetinkaya-Rundel for sharing their web page template, which we used in creating this website.