Final Project
General Instructions
You will be assigned to a pre-assigned team of 3 or 4 students and a Teaching Fellow (TF) who will supervise your project. Your team will select one project approach from the options below and report to your assigned TF throughout the semester.
Important: All project communication should go through your assigned TF. Do not contact the instructors directly about project questions.
| Team | TF | Team Slack Channel | Members |
|---|---|---|---|
| Team 1 | Angela | #bst-260-project-team1 | Motohiko Adomi, Jethro Au, Kate Bucci, Anthony Candelmo |
| Team 2 | Angela | #bst-260-project-team2 | Tiger Chaisutyakorn, Yixiao Chen, EK Cheng, Huihan Cui |
| Team 3 | Angela | #bst-260-project-team3 | Sylvia Deng, Yinuo Du, Qianyu Fan |
| Team 4 | Angela | #bst-260-project-team4 | Tanvi Gaitonde, Gabrielle Gonzalez, Allen Gu, Camila Guetter |
| Team 5 | Angela | #bst-260-project-team5 | Siwei Guo, Hannah Hamling, Shuying Han, Runpeng Hu |
| Team 6 | Ava | #bst-260-project-team6 | Anthea Hua, Jiaming Huang, Helen Keetley, Helena Li |
| Team 7 | Ava | #bst-260-project-team7 | Stellen Li, Yutong Li, Jason Liang, Julia Lin |
| Team 8 | Ava | #bst-260-project-team8 | Siyan Lin, Cindy Liu, Jasper Liu, Junhao Luo |
| Team 9 | Ava | #bst-260-project-team9 | Peng Luo, Yuxi Luo, Adriana Manjon, Neel Mirani |
| Team 10 | Ava | #bst-260-project-team10 | Bao Han Ngo, Ryan Ou, Katelyn Power, Chloe Qiu |
| Team 11 | Emma | #bst-260-project-team11 | Yvonne Qiu, Ziyue Qiu, Varshini Ramanathan, April Ren |
| Team 12 | Emma | #bst-260-project-team12 | Aanika Schueler, Emily Shen, Shriya Sai Shivakumar, Erica Song |
| Team 13 | Emma | #bst-260-project-team13 | Rahul Srinivasaragavan, Nyah Strickland, Christina Wang, Hengyuan Wang |
| Team 14 | Emma | #bst-260-project-team14 | Siwen Wang, Yuanshu Wang, Emily Weng, Andrew Wu |
| Team 15 | Emma | #bst-260-project-team15 | Kai Wu, Yanting Wu, Zhentian Wu, Baoyue Xing |
| Team 16 | Jing | #bst-260-project-team16 | Lavinia Xu, Xinyu Xu, Xu Yan, Cuiqiyun Yang |
| Team 17 | Jing | #bst-260-project-team17 | Yuntian Yang, Haochen Ye, Haihan Yuan, Yiwei Yun |
| Team 18 | Jing | #bst-260-project-team18 | Irene Zhang, Iris Zhang, Yiyang Zhang, Zihan Zhang |
| Team 19 | Jing | #bst-260-project-team19 | Johnny Zhao, Mengze Zhao, Tianyu Zhao, Yinuo Zhao |
| Team 20 | Jing | #bst-260-project-team20 | Zifan Zhao, Junyi Zhou, Zi Zhu |
Note: All project communication should happen in this channel with your TF.
Grading: The final project accounts for 20% of your course grade, divided as follows:
- Final Project Report: 10% of course grade
- Oral Presentation: 10% of course grade
GitHub Repository Setup
Creating Your Team Repository
Repository Naming Convention: bst-260-2025F-team# (where # is your team number, e.g., bst-260-2025F-team1, bst-260-2025F-team15)
Setup Process:
- One team member creates the repository on GitHub with the exact naming convention above
- Make the repository private
- Add all team members as collaborators with write access
- Add your assigned TF as a collaborator with write access
- Set up the initial directory structure (see requirements below)
Notifying Your TF
Once your repository is created:
- Post the repository URL in your TF’s Slack channel (see table above)
- Include your team number and all team member names
- Your TF will confirm access and provide initial feedback
Example Slack message:
Team 5 Repository: https://github.com/username/bst-260-2025F-team5
Members: Siwei Guo, Hannah Hamling, Shuying Han, Runpeng Hu
Ready for initial review of `Outline.txt`
Communication Guidelines
All project communication must go through your assigned TF using your team’s private Slack channel:
- Your TF is your primary point of contact for all project matters
- TFs will approve your
Outline.txtand provide ongoing feedback - TFs will review and approve your final project submission
- TFs will conduct your oral evaluation
- Only your TF can escalate issues to instructors if necessary
Project Framework: NHANES Data Analysis
All teams will work with the NHANES (National Health and Nutrition Examination Survey) dataset using the NHANES R package. You must choose between two versions and justify your choice:
- NHANES: A probability sample from the US population (recommended for most analyses)
- NHANESraw: Survey-weighted population data (recommended for teams interested in epidemiological methods using survey weights)
Project Differentiation: Each team must have a distinct project through different age groups, outcomes, or methodological approaches.
Project Planning Requirements
Within one week of team assignment, you must:
- Create your GitHub repository (see setup instructions above)
- Choose your specific project focus (see options below)
- Create a file called
Outline.txt(plain text) in your GitHub repository with:- Your analysis plan and research questions
- Team member responsibilities for each activity
- Weekly breakdown of tasks and deliverables
- Notify your TF via Slack when ready for review and feedback
You should update this outline document throughout the project to track progress and any changes to your plan.
Project Timeline & Deadlines
| Milestone | Deadline | Responsibility |
|---|---|---|
Outline.txt Submission |
October 31, 2025 | Teams post in their Slack channel |
Outline.txt Feedback |
November 7, 2025 | TFs provide approval/revision requests |
| Revised Outline (if needed) | November 12 | Teams address TF feedback |
| Oral Presentation Scheduling | December 12, 2025 | Teams schedule with TF for December 16-19 window |
| Final Project Submission | December 15, 2025 | Teams notify TF when repository is complete |
Important: All deadlines are firm. Late submissions will result in grade penalties.
Available Project Approaches
Option 1: Age-Specific Health Analysis
Focus on one of these age groups with appropriate health outcomes:
- Ages 2-18: Growth patterns, childhood health indicators (note: different BMI/growth relationships than adults)
- Ages 19-40: Young adult health patterns, lifestyle factors
- Ages 40+: Disease risk factors, aging-related health patterns
Key considerations: Include important covariates (age, sex, race) and their interactions. Consult relevant literature for risk factors specific to your chosen age group and outcome.
Option 2: Statistical Methodology Focus
Instead of focusing on specific health outcomes, examine statistical approaches:
- Multiple imputation methods for handling missing data patterns
- Spline models for interpretable non-linear associations with continuous variables
- Survey methods for population-level estimates and inference
- Machine learning approaches (logistic regression, random forests) for data structure exploration
- Interactive applications using Shiny/Posit Connect for data exploration tools
Option 3: Missing Data Analysis
Investigate missing data patterns in NHANES:
- Understand why observations are missing for different variables
- Apply modern missing data methods
- Compare analytical approaches under different missingness assumptions
Option 4: Custom Research Question
Propose your own research question using NHANES data, subject to TF approval. Must demonstrate clear analytical approach and feasibility.
Important Data Considerations
Missing Values: Many NHANES variables have substantial missing data. Consider whether this affects your analysis or could be the focus of your study.
Outcome Variables: Examples include systolic blood pressure, diabetes risk, depression scores, sleep patterns, smoking status, BMI. Continuous outcomes may be easier to model initially.
Project Submission Requirements
You will submit your project using Git. Your project should be completely reproducible, meaning all the code and data needed to render your report from scratch should be in the repository.
Required Submissions to your TF:
GitHub Repository: Submit the link to your team’s GitHub repository with all components below
Oral Presentation: Each team must schedule a 20-minute oral presentation with their assigned TF via Zoom. Scheduling must be completed by December 12, 2025, for presentations during December 16-19, 2025. All team members must be present during the presentation. The TF will ask each team member specific questions about different aspects of the project based on their individual contributions detailed in the contribution summary document (e.g., if a member contributed to data analysis, they may be asked about model choice, coding decisions, statistical methods, etc.).
Oral Presentation Evaluation Rubric
The oral presentation will be evaluated as a group grade out of 10 points based on the following criteria:
| Score | Criteria |
|---|---|
| 0-1 | No meeting scheduled or major absence of team members |
| 2-3 | Limited understanding of project components; significant gaps in explanations |
| 4-5 | Moderate understanding of project components; difficulty explaining individual contributions/methodological choices |
| 6-7 | Good understanding of project components; able to explain most individual contributions/methodological choices |
| 8-9 | Very good understanding of all project components; strong defense of analytical approaches with minimal gaps |
| 10 | Excellent understanding of all project components; exceptional ability to defend and discuss all aspects of the analysis |
Final Report Evaluation Rubric
The final report will be evaluated as a group grade out of 10 points based on the following criteria:
| Score | Criteria |
|---|---|
| 9-10 (Excellent) | Clear, well-structured report with sophisticated analysis. |
| 8-8.9 (Very Good) | Well-written report with good analysis. |
| 7-7.9 (Good) | Adequate report structure with acceptable analysis. |
| 6-6.9 (Satisfactory) | Report meets basic requirements but has several areas for improvement. |
| 4-5.9 (Needs Improvement) | Report has significant structural problems or analysis errors. |
| 0-3.9 (Inadequate) | Report does not meet basic requirements. |
Detailed Criteria:
9-10 Points (Excellent): Clear, well-structured report with sophisticated analysis. All sections meet word count requirements. Exceptional use of statistical methods, excellent data visualization, and insightful interpretation. Professional formatting with proper citations. Demonstrates deep understanding of NHANES data and chosen methodology.
8-8.9 Points (Very Good): Well-written report with good analysis. Minor issues in structure or presentation. Good use of statistical methods and visualizations. Most sections meet requirements with solid interpretation of results.
7-7.9 Points (Good): Adequate report structure with acceptable analysis. Some sections may be slightly under/over word count. Basic statistical methods applied correctly. Visualizations present but could be improved. Interpretation shows understanding but lacks depth.
6-6.9 Points (Satisfactory): Report meets basic requirements but has several areas for improvement. Statistical analysis is basic or contains minor errors. Limited interpretation of results. Some formatting or structural issues.
4-5.9 Points (Needs Improvement): Report has significant structural problems or analysis errors. Poor data visualization or interpretation. Major sections missing or substantially under word count. Limited understanding of methodology.
0-3.9 Points (Inadequate): Report does not meet basic requirements. Major analysis errors, missing key sections, or demonstrates poor understanding of the data and methods. Unprofessional presentation.
TF Feedback Process
Your TF will provide detailed feedback and final grades through GitHub Issues:
After your final submission, your TF will create Issues in your repository for:
- Overall project feedback and final grade breakdown
- Specific comments on analysis, methods, or presentation
- Individual contribution assessment based on commit history and contribution files
Grade breakdown will include:
- Final Report Score (out of 10 points)
- Oral Presentation Score (out of 10 points)
- Overall Final Project Grade calculation
Teams can respond to TF feedback through Issue comments if clarification is needed
Note: Check your repository for Issues after the final project deadline for comprehensive feedback and grades.
Report Structure
You will prepare a comprehensive report following the style of an academic paper. This report will be divided into the following five structured sections, with approximate word counts to help you reach a target of 2,500 to 3,000 words, up to four figures and up to two tables.
Abstract (150-200 words)
- Purpose: The abstract provides a concise summary of your project, including its objectives, key findings, and significance. Write this section last, after completing all other sections, to accurately reflect your project’s focus and main results.
- Guidelines: Limit this section to 150-200 words. Briefly outline the purpose of your study, the approach you used, and the primary results and conclusions. The abstract should be clear, succinct, and give readers an immediate understanding of what your project entails.
Introduction (500-600 words)
- Purpose: The introduction sets the stage for your project, presenting the background and rationale for your analysis. Explain why the topic is significant and justify your choice of NHANES vs. NHANESraw.
- Guidelines: Start with a broad overview of the topic, gradually narrowing down to your specific focus. Conclude with a clear statement of your research questions, hypotheses, or objectives. Use 2-3 paragraphs to establish a solid foundation for the rest of the paper.
Methods (600-700 words)
- Purpose: This section details the data sources, methods, and analytical techniques you used to conduct your analysis. It should be specific enough that someone else could replicate your study using the same resources and approach.
- Guidelines: Describe the NHANES dataset version you used and justify your choice. Outline your approach for cleaning and analyzing the data, including any statistical or computational methods applied. Clearly explain any assumptions or limitations in your approach, particularly regarding missing data handling.
Results (500-600 words)
- Purpose: The results section presents the main findings of your analysis without interpretation. Organize the data logically to highlight key insights, using tables, figures, and charts to illustrate trends and comparisons.
- Guidelines: For each result, briefly describe it and refer to relevant visuals or tables where appropriate. Do not provide explanations or discuss implications in this section; focus only on presenting the findings clearly and accurately.
Use of AI Tools
Students are welcome to use AI tools as a complementary aid, but they must clearly state in their report where AI was used (e.g., text generation, editing, data analysis suggestions, or AI-assisted conclusions). AI should serve as a productivity and learning tool, not as the primary author of the report.
Discussion (600-700 words)
- Purpose: In the discussion, interpret the significance of your findings, explore potential implications, and relate the results back to your initial research questions or hypotheses. This section allows you to discuss any patterns, unexpected findings, or limitations and suggest possible future research.
- Guidelines: Analyze your results in the context of your research question and relevant health literature. Consider what your findings reveal, any limitations they may have (particularly regarding missing data or survey design), and how they might impact future work or policy. End with a brief conclusion summarizing your main insights.
Your final report should be professionally formatted, with each section clearly labeled and referenced. Aim for clarity, precision, and a well-organized presentation of your analysis.
Total Word Count: Approximately 2,500-3,000 words.
Supplementary Methods (no limit)
You can include a separate document titled Supplementary Methods.
Purpose: Share any mathematical derivations, data visualizations, or tables needed to justify the choices described in the Methods Section. You can also provide further support for the claims made in the Results Section. You can refer to this document in the main report.
Guidelines: There is no limits in the length of this section nor on the number of figures and tables. However, be careful not to drown the graders with too much information.
GitHub Repository Requirements
Your repository must include:
- Directory structure:
code,data, anddocsdirectories - Analysis scripts: At least one script for data wrangling in the
codedirectory - Main report: One file called
final-project.qmdthat renders to produce the final report (can be incodeor home directory) - README file: Explaining how to reproduce all results
- Project outline:
Outline.txtwith your initial plan and progress updates - Individual contribution files: Each team member creates a file with their name documenting their specific contributions and effort (hours/week)
- Data handling: Include code that loads NHANES data via the R package (no need to store raw data)
Git Requirements: We expect to see at least five meaningful commits by each person demonstrating collaborative development throughout the project timeline.