|
Date
|
Module
|
Topics
|
|
Mon, Aug 28
|
Productivity Tools
|
RStudio, RStudio Projects, Quarto, Unix
|
|
Wed, Aug 30
|
Productivity Tools
|
Git and GitHub
|
|
Mon, Sep 4
|
No class
|
Labor day
|
|
Wed, Sep 6
|
R
|
-
R Basics: The workspace, data types, coercing, lists, packages, namespaces, help, creating vectors, object oriented programming.
-
Vectorization: Vector arithmetics, sapply, split, cut, lapply, subsetting, sorting
|
|
Mon, Sep 11
|
R
|
-
Introduction to Tidyverse: tidy data, mutate, select, filter, the pipe, summarize, group_by, sorting, and the purrr package
-
Dates and time: Date class and the lubridate package
|
|
Wed, Sep 13
|
R
|
-
Importing data
-
File types: binary, ascii, unicode
-
LocalesImporting data
-
Downloading files
-
The data.table package
|
|
Mon, Sep 18
|
Data visualization
|
-
Visualizing Distributions: Summary statistics, distributions, histograms, smooth densities, the normal distribution, quantiles, percentiles, and boxplots.
-
Grammar of graphics and the basics of the ggplot2 package
|
|
Wed, Sep 20
|
Data visualization
|
-
Data visualization principles
-
ggplot2 geometries
|
|
Fri, Sep 22
|
Problem set 1 due
|
|
|
Mon, Sep 25
|
Data wrangling
|
-
Reshaping data
-
Joining tables
|
|
Wed, Sep 27
|
Data wrangling
|
-
Web scraping
-
String processing
-
Text mining
|
|
Mon, Oct 2
|
Probability
|
-
Monte Carlo simulations
-
Random Variables
-
Central Limit Theorem
-
Probability case studies: Roulette, Poker, Birthday problem, Monte Hall, insurance
|
|
Wed, Oct 4
|
Inference
|
-
Polls
-
Guess the proportion of blue beads competition
-
Confidence intervals
-
Data-driven models
|
|
Fri, Oct 6
|
Final project title due
|
Submit title and a describe your plans to obtain data
|
|
Mon, Oct 9
|
No class
|
Indigenous Peoples’ Day
|
|
Wed, Oct 11
|
Inference
|
-
Bayesian statistics
-
Hierarchical Models
-
Case study: election forecasting
|
|
Mon, Oct 16
|
Midterm 1
|
Includes all topics covered by October 11.
|
|
Wed, Oct 18
|
Linear Models
|
-
Regression and correlation
-
Case study: is height hereditary?
-
Bivariate normal distribution, conditional expectations, least squares estimates
|
|
Mon, Oct 23
|
Linear Models
|
-
Multivariable regression
-
Case study: build a baseball team
|
|
Wed, Oct 25
|
Linear Models
|
-
Measurement error models
-
Treatment effect models
-
Case study: does a high-fat diet increase weight in mice?
|
|
Mon, Oct 30
|
Linear Models
|
-
Association tests
-
Correlation is not causation
|
|
Wed, Nov 1
|
High dimensional data
|
-
Matrices in R
-
Case study: handwritten digits
|
|
Fri, Nov 3
|
Problem set 2 due
|
One paragraph description of projects that includes what dataset will be used.
|
|
Mon, Nov 6
|
High dimensional data
|
Dimension reduction: Linear algebra, distance, PCA
|
|
Wed, Nov 8
|
High dimensional data
|
-
Dimension reduction continued
-
Case study: gene expression differences between ethnic groups.
|
|
Fri, Nov 10
|
Project description due
|
One paragraph description of projects that includes what dataset will be used.
|
|
Mon, Nov 13
|
High dimensional data
|
-
Regularization
-
Case study: Recommendations systems in movie ratings
-
Matrix factorization
|
|
Wed, Nov 15
|
Machine Learning
|
-
Introduction, definition of concepts, accuracy, test, training and validation sets
-
Evaluation metrics: ROC curves, precision recall curves
|
|
Mon, Nov 20
|
Midterm 2
|
Includes topics covered until Nov 15.
|
|
Wed, Nov 22
|
No class
|
Thanksgiving
|
|
Mon Nov 27
|
Machine Learning
|
-
Smoothing
-
Case study: Death rates after natural disasters
|
|
Wed, Nov 29
|
Machine Learning
|
-
Cross-Validation
-
caret package
|
|
Mon, Dec 4
|
Machine Learning
|
Example of algorithms
-
Case study: reading handwritten digits
|
|
Wed, Dec 6
|
Other topics
|
Possible topcis (open to student requests)
-
Shiny
-
Interactive graphics: plotly
-
Advanced Quarto
-
Research topics
-
Large language models
-
Deep learning
|
|
Fri Dec 8
|
Problem set 3 due
|
|
|
Mon, Dec 11
|
Help with project
|
|
|
Wed, Dec 13
|
Help with project
|
|
|
Wed, Dec 15
|
Final project due
|
|