Schedule

The schedule is subject to change.

Date Module Topics
Mon, Aug 28 Productivity Tools

 RStudio, RStudio Projects, Quarto, Unix

Wed, Aug 30 Productivity Tools

Git and GitHub      

Mon, Sep 4

No class

Labor day
Wed, Sep 6

R

  • R Basics: The workspace, data types, coercing, lists, packages, namespaces, help, creating vectors, object oriented programming.
  • Vectorization: Vector arithmetics, sapply, split, cut, lapply, subsetting, sorting
Mon, Sep 11
  • Introduction to Tidyverse: tidy data, mutate, select, filter, the pipe, summarize, group_by, sorting, and the purrr package
  • Dates and time: Date class and the lubridate package
Wed, Sep 13 R
  • Importing data
    • File types: binary, ascii, unicode
    • LocalesImporting data
    • Downloading files
  • The data.table package
Mon, Sep 18 Data visualization
  • Visualizing Distributions: Summary statistics, distributions, histograms, smooth densities, the normal distribution, quantiles, percentiles, and boxplots.
  • Grammar of graphics and the basics of the ggplot2 package
Wed, Sep 20 Data visualization
  • Data visualization principles
  • ggplot2 geometries
Fri, Sep 22 Problem set 1 due
Mon, Sep 25 Data wrangling
  • Reshaping data
  • Joining tables
Wed, Sep 27 Data wrangling
  • Web scraping
  • String processing
  • Text mining

Mon, Oct 2

Probability
  • Monte Carlo simulations
  • Random Variables
  • Central Limit Theorem
  • Probability case studies: Roulette, Poker, Birthday problem, Monte Hall, insurance

Wed, Oct 4

Inference
  • Polls
  • Guess the proportion of blue beads competition
  • Confidence intervals
  • Data-driven models

Fri, Oct 6

Final project title due

 Submit title and a describe your plans to obtain data

Mon, Oct 9

No class Indigenous Peoples’ Day

Wed, Oct 11

Inference
  • Bayesian statistics
  • Hierarchical Models
  • Case study: election forecasting

Mon, Oct 16

Midterm 1
Includes all topics covered by October 11.

Wed, Oct 18

Linear Models
  • Regression and correlation
    • Case study: is height hereditary?
    • Bivariate normal distribution, conditional expectations, least squares estimates

Mon, Oct 23

Linear Models
  • Multivariable regression
    • Case study: build a baseball team

Wed, Oct 25

Linear Models
  • Measurement error models
  • Treatment effect models 
  • Case study: does a high-fat diet increase weight in mice?

Mon, Oct 30

Linear Models
  • Association tests
  • Correlation is not causation

Wed, Nov 1

High dimensional data
  • Matrices in R 
  • Case study: handwritten digits

Fri, Nov 3 

Problem set 2 due One paragraph description of projects that includes what dataset will be used.

Mon, Nov 6

High dimensional data

Dimension reduction: Linear algebra, distance, PCA

Wed, Nov 8

High dimensional data
  • Dimension reduction continued
  • Case study: gene expression differences between ethnic groups.

Fri, Nov 10 

Project description due One paragraph description of projects that includes what dataset will be used.

Mon, Nov 13

High dimensional data

  • Regularization
    • Case study: Recommendations systems in  movie ratings
  • Matrix factorization

Wed, Nov 15

Machine Learning
  • Introduction, definition of concepts, accuracy, test, training and validation sets
  • Evaluation metrics: ROC curves, precision recall curves

Mon, Nov 20

Midterm 2
Includes topics covered until Nov 15.

Wed, Nov 22

No class  Thanksgiving

Mon Nov 27

Machine Learning
  • Smoothing
  • Case study: Death rates after natural disasters

Wed, Nov 29

Machine Learning
  • Cross-Validation
  • caret package

Mon, Dec 4

Machine Learning

 Example of algorithms 

  • Case study: reading handwritten digits

Wed, Dec 6 

Other topics

Possible topcis (open to student requests)

  • Shiny
  • Interactive graphics: plotly
  • Advanced Quarto
  • Research topics
  • Large language models
  • Deep learning

Fri Dec 8

Problem set 3 due

Mon, Dec 11

Help with project

 

Wed, Dec 13

Help with project

Wed, Dec 15

Final project due