BST 260 Introduction to Data Science

Course Information

Lectures

Lecture slides, class notes, and problem sets are linked below. New material is added approximately on a weekly basis.

Dates Topic Slides Reading
Sep 04 Productivity Tools Intro, Unix Installing R and RStudio on Windows or Mac, Getting Started, Unix
Sep 09, Sep 11 Productivity Tools RStudio, Quarto, Git and GitHub RStudio Projects, Quarto, Git
Sep 16, Sep 19 R R basics, Vectorization R Basics, Vectorization
Sep 23 R Tidyverse, ggplot2 dplyr, ggplot2
Sep 25 R Tyding data Reshaping Data
Sep 30, Oct 02 Wrangling Intro, Data Importing, Dates and Times, Locales, Data APIs, Web scraping, Joining tables Importing data, dates and times, Locales, Joining Tables, Extracting data from the web
Oct 07, Oct 09 Data visualization Data Viz Principles, Distributions, Dataviz in practice Distributions, Dataviz Principles
Oct 16 Midterm 1 Covers material from Sep 04-Oct 11
Oct 21 Probability Intro, Foundations for Inference Monte Carlo, Random Variables & CLT
Oct 23 Inference Intro, Parameter and estimates, Confidence Intervals Parameters & Estimates, Confidence Intervals
Oct 28, Oct 30 Statistical Models Models, Bayes, Hierarchical Models Data-driven Models, Bayesian Statistics, Hierarchical Models
Nov 04, Nov 06 Linear models Intro, Regression Regression, Multivariate Regression
Nov 13, Nov 18 Linear models Multivariate regression, Treatment effect models Measurement Error Models, Treatment Effect Models, Association Tests, Association Not Causation
Nov 20 High dimensional data Intro to Linear Algebra, Matrices in R Matrices in R, Applied Linear Algebra,
Nov 25 Midterm 2 Midterm 2: cover material from Sep 04-Nov 22
Dec 02 High dimensional data Distance, Dimension reduction Dimension Reduction
Dec 04 Machine Learning Intro, Metrics, Conditionals, Smoothing Notation and terminology, Evaluation Metrics, conditional probabilities, smoothing
Dec 09, Dec 11 Machine Learning kNN, Resampling methods, caret package, Algorithms, ML in practice Resampling methods, ML algorithms, ML in practice
Dec 16, Dec 18 Other topics Association is not causation, Shiny, Shiny example code Association is not causation

Problem sets

Problem set Topic Due Date Difficulty
01 Unix, Quarto Sep 11 easy
02 R Sep 19 easy
03 Tidyverse Sep 27 medium
04 Wrangling Oct 4 medium
05 Covid 19 data visualization Oct 11 medium
06 Probability Oct 25 easy
07 Predict the election Nov 04 hard
08 Excess mortality after Hurricane María Nov 15 medium
09 Matrices Nov 22 easy
10 Digit reading Dec 16 hard
Final Project Your choice Dec 20 hard

Office hour times

Meeting Time Location
Rafael Irizarry Mon 8:45-9:45AM Kresge 203
Corri Sept Tue 3:00-4:00PM Kresge 201
Nikhil Vytla Wed 2:00-3:00PM Kresge 201
Yuan Wang Fri 1:00-2:00PM Zoom

Acknowledgments

We thank Maria Tackett and Mine Çetinkaya-Rundel for sharing their web page template, which we used in creating this website.