Students will work on a month-long data science project. The goal of the project is to go through the complete data science process to answer questions you have about a topic of your own choosing. You will acquire the data, design your visualizations, run statistical analyses, and communicate results.

Project Team

Students may work in teams of 2-4. Students will work closely with other classmates on this project. Canvas or Slack can be used to find prospective team members. In general, we do not anticipate that the grades for each group member will be different. However, we reserve the right to assign different grades to each group member based on peer assessments (see below).

Project Milestones

There are a few milestones for the final project. It is critical to note that no extensions will be given for any of the project due dates for any reason, except for COVID-19 related emergencies or other unforeseen emergencies. Late days may not be used. Projects submitted after the final due date will not be graded. Students who anticipate any issues should send an email to the teaching staff at least one week in advance.

Date Description
November 2 by 11:59pm EST Form a team and submit a project proposal
November 2 - 20 Project review meeting with your assigned TA
December 13 by 11:59pm EST RMarkdown and compiled HTML due
December 13 by 11:59pm EST Peer assessment due
December 13 by 11:59pm EST Project webpage and screencast due
December 16 Project screencasts shown and best project prizes

Deliverables

There are several deliverables for your project that will be graded individually to make up your final project score.

Team Registration and Proposal

Students start by filling out a google form to define your teams and project proposal. This form should be filled out by 11:59pm EST on Monday November 2, 2020. The title and other projects may be changed at a later date if needed. Each team will only need to submit one form. Based on the proposal, a TA will be assigned to each team and will guide students through the rest of the project. Students will schedule a project review meeting with their assigned TA within the following three weeks (November 2-20, 2020). Students should ensure all of your team members are present at the meeting.

RMarkdown and HTML Files

An important part of the project is the RMarkdown and associated HTML file. These will detail the steps taken in developing a solution(s), including how students collected the data, alternative solutions tried, statistical methods used, and insights. Equally important to the final results is how the team got there! The RMarkdown and HTML files are the place you describe and document the space of possibilities explored at each step of the project. We strongly advise you to include many visualizations.

The RMarkdown file should include the following topics. Depending on the project type, the amount of discussion devoted to each will vary:

  • Overview and Motivation: Provide an overview of the project goals and the motivation for it. Consider that this will be read by people who did not see your project proposal.

  • Related Work: Anything that inspired you, such as a paper, a web site, or something we discussed in class.

  • Initial Questions: What questions are you trying to answer? How did these questions evolve over the course of the project? What new questions did you consider in the course of your analysis?

  • Data: Source, scraping method, cleanup, etc.

  • Exploratory Analysis: What visualizations did you use to look at your data in different ways? What are the different statistical methods you considered? Justify the decisions you made, and show any major changes to your ideas. How did you reach these conclusions?

  • Final Analysis: What did you learn about the data? How did you answer the questions? How can you justify your answers? Note that 1 type of analysis per team member is required. A Shiny app counts as a type of analysis.

As this will be your only chance to describe your project in detail, make sure that your RMarkdown file and compiled HTML file are standalone documents that fully describe your process and results. The RMarkdown and HTML files are due Sunday, December 13 by 11:59pm EST. For instructions on how to submit, please see Submission Instructions below.

Code

We expect you to write high-quality and readable R code in your RMarkdown file. You should strive for doing things the right way and think about aspects such as reproducibility, efficiency, cleaning data, etc. We also expect you to document your code.

Peer Assessment

It is important to provide positive feedback to people who truly worked hard for the good of the team and to also make suggestions to those you perceived not to be working as effectively on team tasks. We ask you to provide an honest assessment of the contributions of the members of your team, including yourself. The feedback you provide should reflect your judgment of each team member:

  • Preparation: were they prepared during team meetings?

  • Contribution: did they contribute productively to the team discussion and work?

  • Respect for others’ ideas: did they encourage others to contribute their ideas?

  • Flexibility: were they flexible when disagreements occurred?

Your teammates’ assessment of your contributions and the accuracy of your self-assessment will be considered as part of your overall project score. The peer assessment is due Sunday, December 13 by 11:59pm EST. For instructions on how to submit, please see Submission Instructions below.

Project Website

You will create a public website for your project using Google Sites or Github Pages or any other web hosting service of your choice. The website should effectively summarize the main results of your project and tell a story. Consider your audience (the site is public) and keep the level of discussion at the appropriate level. Your RMarkdown file, HTML file and data should be linked from your GitHub Repository (see below) to the web site as well. Also embed your main visualizations and your screencast in your website.

The final project website is due Sunday, December 13 by 11:59pm EST. For instructions on how to submit, please see Submission Instructions below.

Project Screencast

Each team will create a two minute screencast with narration showing a demo of your project and/or some slides. Information about how to prepare these screencasts can be found here. Please make sure that the sound quality of your video is good - it may be worthwhile to invest in an external USB microphone. Upload the video to an online video-platform such as YouTube or Vimeo and embed it into your project web page. We will show the best videos in class.

We will strictly enforce the two minute time limit for the video, so please make sure you are not running longer. Use principles of good storytelling and presentations to get your key points across. Focus the majority of your screencast on your main contributions rather than on technical details. What do you feel is the best part of your project? What insights did you gain? What is the single most important thing you would like your audience to take away? Make sure it is upfront and center rather than at the end.

The final project screencast is due Sunday, December 13 by 11:59pm EST. For instructions on how to submit, please see Submission Instructions below.

Submission Instructions

How to submit RMarkdown and HTML files (due Sunday, December 13 by 11:59pm EST)

  1. Create a GitHub repository which should include the data used for the final project, the RMarkdown file and the compiled HTML file. If the data are too big to fit in the repository, make the data accessible somewhere online (google drive, downloadable link, etc). Inside the RMarkdown file at the top, include instructions on where to access the data. If we cannot access your work or links because these directions are not followed correctly, we will not grade your work.
  2. You should only have one GitHub repository per team, but make sure the names of all group members are inside the RMarkdown file at the top.
  3. Email your TA instructions on where to access the data and the location of your GitHub repository.
  4. Add text to the README file to help visitors and the TAs navigate your repository.

How to submit the Peer Assessment (due Sunday, December 13 by 11:59pm EST)

Each individual team member needs to fill out this google form for the peer evaluation. Your individual project score will take into account your self and peer assessment.

How to submit the Website and Screencast (due Sunday, December 13 by 11:59pm EST)

Fill out this google form to submit the links to the website and screencast. If we cannot access the website or screencast, we cannot grade it.

Grading

The final project is graded in two parts:

  1. Final Project Part I (worth 10% of total grade). This portion represents the Team Registration and Final Project Proposal which is due November 2 by 11:59pm EST.

  2. Final Project Part II (worth 25% of total grade). This portion will be split into two sub-portions:

    • 80% of the Final Project Part II will be based on your RMarkdown and HTML files in your GitHub repository. This includes the quality of your data analysis and R code, the complexity and level of difficulty of your project, completeness and overall functionality of your analysis. This sub-portion (and peer assessment) is due Sunday, December 13 by 11:59pm EST.
    • 20% of the Final Project Part II will be based on your web site and screencast and the quality of their storytelling aspects. This sub-portion is due Sunday, December 13 by 11:59pm EST.

Your individual project score will also be determined by your peer evaluations.

Example Final Projects

Here are some examples of successful final projects. Note: These projects came from another course we taught on Data Science similar to this one except the previous course used Python, not R.

  1. Predicting Hubway bike/dock availability (Website, Screencast)
  2. Across the Bay 10K Race (Website, Screencast)
  3. Predicting Citation Counts of arXiv Papers (Website, Screencast)
  4. Improving University Energy Efficiency: Building Energy Demand Prediction (Website, Screencast)
  5. Predicting AirBnb Success (Website, Screencast)