Final Project

General Instructions

For the final project you will choose one of these topics:

COVID-19 pandemic in the US
Excess mortality in Puerto Rico after Hurricane María.
Topic of your choice (needs instructor approval).
Election results dashboard.

You will submit your project using Git. Your project should be completely reproducible, meaning all the code and data needed to render your report from scratch should be in the repository.

You can work alone or in groups of at most three people.

If you select option 4 you can skip to Election Dashboard.

Sections

For the first three options, you will prepare a comprehensive report following the style of an academic paper. This report will be divided into the following five structured sections, with approximate word counts to help you reach a target of 2,500 to 3,000 words, up to four figures and up to two tables.

Abstract (150-200 words)

Purpose: The abstract provides a concise summary of your project, including its objectives, key findings, and significance. Write this section last, after completing all other sections, to accurately reflect your project’s focus and main results.
Guidelines: Limit this section to 150-200 words. Briefly outline the purpose of your study, the approach you used, and the primary results and conclusions. The abstract should be clear, succinct, and give readers an immediate understanding of what your project entails.

Introduction (500-600 words)

Purpose: The introduction sets the stage for your project, presenting the background and rationale for your analysis. Explain why the topic is significant.
Guidelines: Start with a broad overview of the topic, gradually narrowing down to your specific focus. Conclude with a clear statement of your research questions, hypotheses, or objectives. Use 2-3 paragraphs to establish a solid foundation for the rest of the paper.

Methods (600-700 words)

Purpose: This section details the data sources, methods, and analytical techniques you used to conduct your analysis. It should be specific enough that someone else could replicate your study using the same resources and approach.
Guidelines: Describe the dataset(s) you used, including information about data collection (e.g., sources, time frame). Outline your approach for cleaning and analyzing the data, including any statistical or computational methods applied. Clearly explain any assumptions or limitations in your approach.

Results (500-600 words)

Purpose: The results section presents the main findings of your analysis without interpretation. Organize the data logically to highlight key insights, using tables, figures, and charts to illustrate trends and comparisons.
Guidelines: For each result, briefly describe it and refer to relevant visuals or tables where appropriate. Do not provide explanations or discuss implications in this section; focus only on presenting the findings clearly and accurately.

Discussion (600-700 words)

Purpose: In the discussion, interpret the significance of your findings, explore potential implications, and relate the results back to your initial research questions or hypotheses. This section allows you to discuss any patterns, unexpected findings, or limitations and suggest possible future research.
Guidelines: Analyze your results in the context of your research question, linking them back to the background information from the introduction. Consider what your findings reveal, any limitations they may have, and how they might impact future work or policy. End with a brief conclusion summarizing your main insights.

Your final report should be professionally formatted, with each section clearly labeled and referenced. Aim for clarity, precision, and a well-organized presentation of your analysis.

Total Word Count: Approximately 2,500-3,000 words.

Supplementary Methods (no limit)

You can include a separate document titled Supplementary Methods.

Purpose: Share any mathematical derivations, data visualizations, or tables needed to justify the choices described in the Methods Section. You can also provide further support for the claims made in the Results Section. You can refer to this document in the main report.
Guidelines: There is no limits in the length of this section nor on the number of figures and tables. However, be careful not to drown the graders with too much information.

Examples

Here are two examples or papers related to the two first topics:

GitHut Repository

We recommend your repository include:

Directories code, data, and docs.
At least one script for wrangling in the code directory.
There should be one file called final-project.qmd that can be rendered to produce the final report. This can be in the code or home directories as long as it renders.
You should include a README file explaining how to reproduce all the results.
If you need to share raw data include it in the raw-data directory. Alternatively, you can include code that downloads the necessary data from the internet.

We expect to see at least five commits by each person.

Covid-19

Data

You can use the data we downloaded in problem set 4.

We want you to examine the entire pandemic, until at least December 1, 2024 which means you will need to obtain population estimates for 2023 and 2024.

For those working in groups we want you to obtain daily or weekly mortality data for each state.

Tasks and questions

Divide the pandemic period, January 2020 to December 2024 into waves. Justify your choice with data visualization.
For each period compute the deaths rates by state. Describe which states did better or worse during the different periods.
Describe if COVID-19 became less or more virulent across the different periods.
For those working in groups: Estimate excess mortality for each week for each state. Do COVID-19 deaths explain the excess mortality?
For those working in groups: Repeat 2 but for excess mortality instead of COVID-19 deaths.

Excess mortality

Data

You can use puerto_rico_counts in the excessmort package.

For those working in groups we want you to wrangle the data from the pdf report shared by the New York Times. https://github.com/c2-d2/pr_mort_official/raw/master/data/Mortalidad-RegDem-2015-17-NYT-part1.pdf

Tasks and questions

Examine the population sizes by age group and sex. Describe any interesting patterns.
Use data from before 2017 to estimate expected mortality and a standard deviation for each week. Do this by age group and sex. Describe tendencies you observe. You can combine data into bigger age groups if the data show they have similar death rates.
Explore the data to see if there are periods during or before 2017 that appear to have excess mortality. If so, explain and recompute expected death rates removing these periods.
Estimate excess deaths for each week of 2017-2018. Make sure you define the weeks so that one of the weeks starts the day María made landfall. Comment on excess mortality. Which age groups were affected? Were men and women affected differently?
For those working in groups: Extract the data from the PDF shared with NY Times. Comment on how it matches with excessmort package data.

Election Dashboard

Data

Obtain pollster data for 2016, 2018, 2020, 2022 elections. We already provided data from the 2024 election.

Tasks and questions

We want you to build a shiny dashboard to be ready to predict the 2026 senate election. To do this you will prepare a dashboard for the 2024 senate election.
The dashboard should show your prediction for each senate race based on a Bayesian analysis.
The dashboard should also show a prediction for the number of seats the democrats will hold. It should display a histogram with your estimate of the probability distribution for each possible outcome.
Use data from previous years to motivate the choice of priors.
For those working in groups: Use the previous 4 elections to estimate correlations between states in the polling bias. Use it to improve your model.

GitHub

All the code needed to construct the shiny app should be included in a GitHub repository, including whatever wrangling you performed for your data analysis.