Unprecedented advances in digital technology during the second half of the 20th century is transforming science, including health and biomedical research. Scientific fields that have traditionally relied upon simple data analysis techniques of smaller datasets have been transformed recently by technologies that continue to expand the possibilities of observing and deciphering molecular entities in an unprecedented way. This course includes concepts from Statistics, Computer Science and Software Engineering. We will learn the necessary skills to manage and analyze data. We will learn concepts such as exploratory data analysis, statistical inference and modeling, machine learning, and high-dimensional data analysis. We will also learn the necessary skills to develop data products including R programming, data wrangling, reproducible research, and communicating results.

Why take this course?

The goal of this course is to teach students how to answer questions with data. To do this, we will learn the necessary skills to manage and analyze data with case studies. We will learn concepts such as exploratory data analysis, statistical inference and modeling, machine learning, and high-dimensional data analysis. We will also learn the necessary skill to develop data products including R programming, data wrangling, reproducible research, and communicating results. All class material will be motivated with real life examples involving data. We will use the R programming language.

What is the structure of this course?

We will learn these concepts through six data analysis projects. Grades will be based on:

  • 5-6 homeworks (35%)
  • 2 midterms (30%)
  • 1 final project (35%)

Late Day Policy

Each student is given six late days for homework at the beginning of the semester. A late day extends the individual homework deadline by 24 hours without penalty. No more than two late days may be used on any one assignment. Assignments handed in more than 48 hours after the original deadline will not be graded. We do not accept any homework under any circumstances more than 48 hours after the original deadline. Late days are intended to give you flexibility: you can use them for any reason no questions asked. You don’t get any bonus points for not using your late days. Also, you can only use late days for the individual homework deadlines all other deadlines (e.g., project milestones) are hard.

Although the each student is only given a total of 6 late days, we will be accepting homework from students that pass this limit. However, we will be deducting 2 points for each extra late day. For example, if you have already used all of your late days for the semester, we will deduct 2 points for assignments <24 hours late, and 4 points for assignments 24-48 hours late.

Prerequisites

Must have basic programming knowledge, and statistics knowledge at the level of Stat 100 or above

Required Textbook

None. Instead, we have a list of recommended readings on the web site.

Course Website

GitHub

Course Communication

We will use Slack to organize course discussions. Each lecture will have a channel that will be monitored by the TAs during class. Feel free to ask questions during class, or anytime. Join Slack here! More information on how to use Slack is posted on our Resources page.

Schedule

This is a TENTATIVE schedule for the course.