BST 260 Introduction to Data Science

Course Information

Lectures

Lecture slides, class notes, and problem sets are linked below. New material is added approximately on a weekly basis.

Dates Topic Slides Reading Instructor(s)
Sep 03 Productivity Tools Intro, Unix Installing R and RStudio on Windows or Mac, Getting Started, Unix Robert
Sep 08, Sep 10 Productivity Tools RStudio, Quarto, Git and GitHub RStudio Projects, Quarto, Git and GitHub Tutorial, Git and GitHub Book Reading Robert
Sep 15, Sep 17 R R basics, Vectorization R Basics, Vectorization Robert
Sep 22, Sep 24 R Tidyverse, ggplot2, Tyding Data dplyr, ggplot2, Reshaping Data Robert
Sep 29, Oct 01 Wrangling Intro, Data Importing, Dates and Times, Locales, Data APIs, Web Scraping, Joining tables Importing Data, dates and times, Locales, Joining Tables, Extracting data from the web Anthony
Oct 06, Oct 08 Data visualization Data Viz Principles, Distributions, Dataviz in practice Distributions, Dataviz Principles Anthony
Oct 15 Midterm 1 Covers material from Sep 03-Oct 08 Anthony
Oct 20 Probability Intro, Foundations for Inference Monte Carlo, Random Variables, Central Limit Theorem Anthony
Oct 22 Inference Intro, Parameter and estimates, Confidence Intervals Parameters & Estimates, Confidence Intervals Anthony
Oct 27, Oct 29 Statistical Models Models, Bayes, Hierarchical Models Data-driven Models, Bayesian Statistics, Hierarchical Models Anthony
Nov 03, Nov 05 Linear models Intro, Regression Regression, Multivariate Regression Robert
Nov 10, Nov 12 Linear models Multivariate Regression, Treatment Effect Models, Association is Not Causation Measurement Error Models, Treatment Effect Models, Association Tests, Association is Not Causation Robert
Nov 17, Nov 19 High dimensional data Intro to Linear Algebra, Matrices in R, Distance, Dimension Reduction Matrices in R, Applied Linear Algebra, Dimension Reduction Robert
Nov 24 Midterm 2 Covers material from Sep 03-Nov 12 Robert
Dec 01, Dec 03 Machine Learning Intro, Metrics, Conditionals, Smoothing Notation and Terminology, Evaluation Metrics, Conditional Probabilities, Smoothing Robert
Dec 08, Dec 10 Machine Learning kNN, Resampling Methods, caret Package, Algorithms Resampling Methods, ML Algorithms Anthony
Dec 15 Machine Learning ML in Practice ML in Practice Anthony
Dec 17 Other topics Shiny Example Code Shiny Basics Robert

Problem Sets

Problem set Topic Due Date Difficulty
Problem Set 1 Unix, Quarto Sep 12 easy
Problem Set 2 R Sep 18 medium
Problem Set 3 Tidyverse Sep 28 hard
Problem Set 4 Wrangling Oct 5 hard
Problem Set 5 Covid 19 data visualization Oct 12 medium
Problem Set 6 Probability Oct 26 easy
Problem Set 7 Predict the election Nov 05 hard
Problem Set 8 Excess mortality after Hurricane María Nov 16 medium
Problem Set 9 Matrices Nov 23 easy
Problem Set 10 Digit reading Dec 19 hard
Final Project NHANES Data Analysis Dec 15 hard

Office Hour Times

Meeting Time Location
Robert Gentleman Monday, 1:00 pm to 2:00 pm Building 2 Room 437F (October via Appointment)
Anthony Christidis Friday, 11:30 am to 12:30 pm Zoom
Angela Wang Monday, 3:45 pm to 4:45 pm Kresge 204 (Except 11/3 which will be in FXB G13)
Ava Harrington Thursday, 2:00 pm to 3:00 pm Kresge 204
Emma Crenshaw Tuesday, 10:00 am to 11:00 am FXB G03 (Except 9/9 which will be in Kresge 201)
Jing Li Wednesday, 1:30 pm to 2:30 pm Kresge LL6

Acknowledgments

For the Fall 2025 iteration of BST 260, the course website was modified by Anthony Christidis, building on the Fall 2024 course template. We thank Maria Tackett and Mine Çetinkaya-Rundel for sharing their web page template, which we used in creating this website.