Final Project

General Instructions

You will be assigned to a pre-assigned team of 3 or 4 students and a Teaching Fellow (TF) who will supervise your project. Your team will select one project approach from the options below and report to your assigned TF throughout the semester.

Important: All project communication should go through your assigned TF. Do not contact the instructors directly about project questions.

Team TF Team Slack Channel Members
Team 1 Angela #bst-260-project-team1 Motohiko Adomi, Jethro Au, Kate Bucci, Anthony Candelmo
Team 2 Angela #bst-260-project-team2 Tiger Chaisutyakorn, Yixiao Chen, EK Cheng, Huihan Cui
Team 3 Angela #bst-260-project-team3 Sylvia Deng, Yinuo Du, Qianyu Fan
Team 4 Angela #bst-260-project-team4 Tanvi Gaitonde, Gabrielle Gonzalez, Allen Gu, Camila Guetter
Team 5 Angela #bst-260-project-team5 Siwei Guo, Hannah Hamling, Shuying Han, Runpeng Hu
Team 6 Ava #bst-260-project-team6 Anthea Hua, Jiaming Huang, Helen Keetley, Helena Li
Team 7 Ava #bst-260-project-team7 Stellen Li, Yutong Li, Jason Liang, Julia Lin
Team 8 Ava #bst-260-project-team8 Siyan Lin, Cindy Liu, Jasper Liu, Junhao Luo
Team 9 Ava #bst-260-project-team9 Peng Luo, Yuxi Luo, Adriana Manjon, Neel Mirani
Team 10 Ava #bst-260-project-team10 Bao Han Ngo, Ryan Ou, Katelyn Power, Chloe Qiu
Team 11 Emma #bst-260-project-team11 Yvonne Qiu, Ziyue Qiu, Varshini Ramanathan, April Ren
Team 12 Emma #bst-260-project-team12 Aanika Schueler, Emily Shen, Shriya Sai Shivakumar, Erica Song
Team 13 Emma #bst-260-project-team13 Rahul Srinivasaragavan, Nyah Strickland, Christina Wang, Hengyuan Wang
Team 14 Emma #bst-260-project-team14 Siwen Wang, Yuanshu Wang, Emily Weng, Andrew Wu
Team 15 Emma #bst-260-project-team15 Kai Wu, Yanting Wu, Zhentian Wu, Baoyue Xing
Team 16 Jing #bst-260-project-team16 Lavinia Xu, Xinyu Xu, Xu Yan, Cuiqiyun Yang
Team 17 Jing #bst-260-project-team17 Yuntian Yang, Haochen Ye, Haihan Yuan, Yiwei Yun
Team 18 Jing #bst-260-project-team18 Irene Zhang, Iris Zhang, Yiyang Zhang, Zihan Zhang
Team 19 Jing #bst-260-project-team19 Johnny Zhao, Mengze Zhao, Tianyu Zhao, Yinuo Zhao
Team 20 Jing #bst-260-project-team20 Zifan Zhao, Junyi Zhou, Zi Zhu

Note: All project communication should happen in this channel with your TF.

Grading: The final project accounts for 20% of your course grade, divided as follows:

  • Final Project Report: 10% of course grade
  • Oral Presentation: 10% of course grade

GitHub Repository Setup

Creating Your Team Repository

Repository Naming Convention: bst-260-2025F-team# (where # is your team number, e.g., bst-260-2025F-team1, bst-260-2025F-team15)

Setup Process:

  1. One team member creates the repository on GitHub with the exact naming convention above
  2. Make the repository private
  3. Add all team members as collaborators with write access
  4. Add your assigned TF as a collaborator with write access
  5. Set up the initial directory structure (see requirements below)

Notifying Your TF

Once your repository is created:

  1. Post the repository URL in your TF’s Slack channel (see table above)
  2. Include your team number and all team member names
  3. Your TF will confirm access and provide initial feedback

Example Slack message:

Team 5 Repository: https://github.com/username/bst-260-2025F-team5
Members: Siwei Guo, Hannah Hamling, Shuying Han, Runpeng Hu
Ready for initial review of `Outline.txt`

Communication Guidelines

All project communication must go through your assigned TF using your team’s private Slack channel:

  • Your TF is your primary point of contact for all project matters
  • TFs will approve your Outline.txt and provide ongoing feedback
  • TFs will review and approve your final project submission
  • TFs will conduct your oral evaluation
  • Only your TF can escalate issues to instructors if necessary

Project Framework: NHANES Data Analysis

All teams will work with the NHANES (National Health and Nutrition Examination Survey) dataset using the NHANES R package. You must choose between two versions and justify your choice:

  • NHANES: A probability sample from the US population (recommended for most analyses)
  • NHANESraw: Survey-weighted population data (recommended for teams interested in epidemiological methods using survey weights)

Project Differentiation: Each team must have a distinct project through different age groups, outcomes, or methodological approaches.

Project Planning Requirements

Within one week of team assignment, you must:

  1. Create your GitHub repository (see setup instructions above)
  2. Choose your specific project focus (see options below)
  3. Create a file called Outline.txt (plain text) in your GitHub repository with:
    • Your analysis plan and research questions
    • Team member responsibilities for each activity
    • Weekly breakdown of tasks and deliverables
  4. Notify your TF via Slack when ready for review and feedback

You should update this outline document throughout the project to track progress and any changes to your plan.

Project Timeline & Deadlines

Milestone Deadline Responsibility
Outline.txt Submission October 31, 2025 Teams post in their Slack channel
Outline.txt Feedback November 7, 2025 TFs provide approval/revision requests
Revised Outline (if needed) November 12 Teams address TF feedback
Oral Presentation Scheduling December 12, 2025 Teams schedule with TF for December 16-19 window
Final Project Submission December 15, 2025 Teams notify TF when repository is complete

Important: All deadlines are firm. Late submissions will result in grade penalties.

Available Project Approaches

Option 1: Age-Specific Health Analysis

Focus on one of these age groups with appropriate health outcomes:

  • Ages 2-18: Growth patterns, childhood health indicators (note: different BMI/growth relationships than adults)
  • Ages 19-40: Young adult health patterns, lifestyle factors
  • Ages 40+: Disease risk factors, aging-related health patterns

Key considerations: Include important covariates (age, sex, race) and their interactions. Consult relevant literature for risk factors specific to your chosen age group and outcome.

Option 2: Statistical Methodology Focus

Instead of focusing on specific health outcomes, examine statistical approaches:

  • Multiple imputation methods for handling missing data patterns
  • Spline models for interpretable non-linear associations with continuous variables
  • Survey methods for population-level estimates and inference
  • Machine learning approaches (logistic regression, random forests) for data structure exploration
  • Interactive applications using Shiny/Posit Connect for data exploration tools

Option 3: Missing Data Analysis

Investigate missing data patterns in NHANES:

  • Understand why observations are missing for different variables
  • Apply modern missing data methods
  • Compare analytical approaches under different missingness assumptions

Option 4: Custom Research Question

Propose your own research question using NHANES data, subject to TF approval. Must demonstrate clear analytical approach and feasibility.

Important Data Considerations

Missing Values: Many NHANES variables have substantial missing data. Consider whether this affects your analysis or could be the focus of your study.

Outcome Variables: Examples include systolic blood pressure, diabetes risk, depression scores, sleep patterns, smoking status, BMI. Continuous outcomes may be easier to model initially.

Project Submission Requirements

You will submit your project using Git. Your project should be completely reproducible, meaning all the code and data needed to render your report from scratch should be in the repository.

Required Submissions to your TF:

  1. GitHub Repository: Submit the link to your team’s GitHub repository with all components below

  2. Oral Presentation: Each team must schedule a 20-minute oral presentation with their assigned TF via Zoom. Scheduling must be completed by December 12, 2025, for presentations during December 16-19, 2025. All team members must be present during the presentation. The TF will ask each team member specific questions about different aspects of the project based on their individual contributions detailed in the contribution summary document (e.g., if a member contributed to data analysis, they may be asked about model choice, coding decisions, statistical methods, etc.).

Oral Presentation Evaluation Rubric

The oral presentation will be evaluated as a group grade out of 10 points based on the following criteria:

Score Criteria
0-1 No meeting scheduled or major absence of team members
2-3 Limited understanding of project components; significant gaps in explanations
4-5 Moderate understanding of project components; difficulty explaining individual contributions/methodological choices
6-7 Good understanding of project components; able to explain most individual contributions/methodological choices
8-9 Very good understanding of all project components; strong defense of analytical approaches with minimal gaps
10 Excellent understanding of all project components; exceptional ability to defend and discuss all aspects of the analysis

Final Report Evaluation Rubric

The final report will be evaluated as a group grade out of 10 points based on the following criteria:

Score Criteria
9-10 (Excellent) Clear, well-structured report with sophisticated analysis.
8-8.9 (Very Good) Well-written report with good analysis.
7-7.9 (Good) Adequate report structure with acceptable analysis.
6-6.9 (Satisfactory) Report meets basic requirements but has several areas for improvement.
4-5.9 (Needs Improvement) Report has significant structural problems or analysis errors.
0-3.9 (Inadequate) Report does not meet basic requirements.

Detailed Criteria:

9-10 Points (Excellent): Clear, well-structured report with sophisticated analysis. All sections meet word count requirements. Exceptional use of statistical methods, excellent data visualization, and insightful interpretation. Professional formatting with proper citations. Demonstrates deep understanding of NHANES data and chosen methodology.

8-8.9 Points (Very Good): Well-written report with good analysis. Minor issues in structure or presentation. Good use of statistical methods and visualizations. Most sections meet requirements with solid interpretation of results.

7-7.9 Points (Good): Adequate report structure with acceptable analysis. Some sections may be slightly under/over word count. Basic statistical methods applied correctly. Visualizations present but could be improved. Interpretation shows understanding but lacks depth.

6-6.9 Points (Satisfactory): Report meets basic requirements but has several areas for improvement. Statistical analysis is basic or contains minor errors. Limited interpretation of results. Some formatting or structural issues.

4-5.9 Points (Needs Improvement): Report has significant structural problems or analysis errors. Poor data visualization or interpretation. Major sections missing or substantially under word count. Limited understanding of methodology.

0-3.9 Points (Inadequate): Report does not meet basic requirements. Major analysis errors, missing key sections, or demonstrates poor understanding of the data and methods. Unprofessional presentation.

TF Feedback Process

Your TF will provide detailed feedback and final grades through GitHub Issues:

  1. After your final submission, your TF will create Issues in your repository for:

    • Overall project feedback and final grade breakdown
    • Specific comments on analysis, methods, or presentation
    • Individual contribution assessment based on commit history and contribution files
  2. Grade breakdown will include:

    • Final Report Score (out of 10 points)
    • Oral Presentation Score (out of 10 points)
    • Overall Final Project Grade calculation
  3. Teams can respond to TF feedback through Issue comments if clarification is needed

Note: Check your repository for Issues after the final project deadline for comprehensive feedback and grades.

Report Structure

You will prepare a comprehensive report following the style of an academic paper. This report will be divided into the following five structured sections, with approximate word counts to help you reach a target of 2,500 to 3,000 words, up to four figures and up to two tables.

Abstract (150-200 words)

  • Purpose: The abstract provides a concise summary of your project, including its objectives, key findings, and significance. Write this section last, after completing all other sections, to accurately reflect your project’s focus and main results.
  • Guidelines: Limit this section to 150-200 words. Briefly outline the purpose of your study, the approach you used, and the primary results and conclusions. The abstract should be clear, succinct, and give readers an immediate understanding of what your project entails.

Introduction (500-600 words)

  • Purpose: The introduction sets the stage for your project, presenting the background and rationale for your analysis. Explain why the topic is significant and justify your choice of NHANES vs. NHANESraw.
  • Guidelines: Start with a broad overview of the topic, gradually narrowing down to your specific focus. Conclude with a clear statement of your research questions, hypotheses, or objectives. Use 2-3 paragraphs to establish a solid foundation for the rest of the paper.

Methods (600-700 words)

  • Purpose: This section details the data sources, methods, and analytical techniques you used to conduct your analysis. It should be specific enough that someone else could replicate your study using the same resources and approach.
  • Guidelines: Describe the NHANES dataset version you used and justify your choice. Outline your approach for cleaning and analyzing the data, including any statistical or computational methods applied. Clearly explain any assumptions or limitations in your approach, particularly regarding missing data handling.

Results (500-600 words)

  • Purpose: The results section presents the main findings of your analysis without interpretation. Organize the data logically to highlight key insights, using tables, figures, and charts to illustrate trends and comparisons.
  • Guidelines: For each result, briefly describe it and refer to relevant visuals or tables where appropriate. Do not provide explanations or discuss implications in this section; focus only on presenting the findings clearly and accurately.

Use of AI Tools

Students are welcome to use AI tools as a complementary aid, but they must clearly state in their report where AI was used (e.g., text generation, editing, data analysis suggestions, or AI-assisted conclusions). AI should serve as a productivity and learning tool, not as the primary author of the report.

Discussion (600-700 words)

  • Purpose: In the discussion, interpret the significance of your findings, explore potential implications, and relate the results back to your initial research questions or hypotheses. This section allows you to discuss any patterns, unexpected findings, or limitations and suggest possible future research.
  • Guidelines: Analyze your results in the context of your research question and relevant health literature. Consider what your findings reveal, any limitations they may have (particularly regarding missing data or survey design), and how they might impact future work or policy. End with a brief conclusion summarizing your main insights.

Your final report should be professionally formatted, with each section clearly labeled and referenced. Aim for clarity, precision, and a well-organized presentation of your analysis.

Total Word Count: Approximately 2,500-3,000 words.

Supplementary Methods (no limit)

You can include a separate document titled Supplementary Methods.

  • Purpose: Share any mathematical derivations, data visualizations, or tables needed to justify the choices described in the Methods Section. You can also provide further support for the claims made in the Results Section. You can refer to this document in the main report.

  • Guidelines: There is no limits in the length of this section nor on the number of figures and tables. However, be careful not to drown the graders with too much information.

GitHub Repository Requirements

Your repository must include:

  • Directory structure: code, data, and docs directories
  • Analysis scripts: At least one script for data wrangling in the code directory
  • Main report: One file called final-project.qmd that renders to produce the final report (can be in code or home directory)
  • README file: Explaining how to reproduce all results
  • Project outline: Outline.txt with your initial plan and progress updates
  • Individual contribution files: Each team member creates a file with their name documenting their specific contributions and effort (hours/week)
  • Data handling: Include code that loads NHANES data via the R package (no need to store raw data)

Git Requirements: We expect to see at least five meaningful commits by each person demonstrating collaborative development throughout the project timeline.