Project
Table of contents
Description
The MS&E 125 project provides hands-on experience with key steps of the data science pipeline:
- Asking research questions
- Identifying dataset(s) to help you answer your questions
- Cleaning, exploring, and analyzing datasets using tools from 125 and beyond
- Synthesizing and compiling your results in a short report
- Presenting your results to an audience
You are free to pursue any topic related to applied statistics. In previous years, teams have considered athletic performance, gender inequality, farming practices, restaurant quality, music success, gentrification, and standardized testing, just to name a few. Any data-driven investigation is fair game.
At the end of the quarter, each team of 3 students will prepare a 3-page memo summarizing their key findings and record a lightning talk of approximately 5-7 minutes to be reviewed by a member of the teaching staff.
Along the way, each team will receive feedback from the course staff as part of four mandatory sub-components:
- 15-minute required project meeting with the course staff
- Project proposal
- HW5 check-in submission
- HW6 check-in submission
Grading
The project is intentionally open-ended and graded holistically. Given the unique challenges faced by each team, there isn’t a one-size-fits-all rubric. Some teams will spend more time collecting complex data and have simpler analyses, while others will pursue more complex analyses of data that have already been cleaned.
As long as there’s evidence that your team has spent time sufficiently collecting, cleaning, exploring, and analyzing your data, and has taken into consideration the comments on your proposal and check-in submissions as part of HW5 and HW6, you should receive high marks.
Important note: Suppose you spend 20 total hours on your project. As it turns out, 5 of those 20 hours were spent cleaning a dataset. Even though cleaning the dataset took 25% of your time, you probably should not devote 25% of your memo and presentation to describing the data cleaning process. Your memo and presentation should highlight the key takeaways of your analysis.
If you have concerns about the specific directions of your project, please see a member of the teaching staff during office hours. We’re happy to lead you in the right direction!
Only one team member needs to submit the project via Gradescope. Please add all team members to your submission group.
FAQs
We may update this section with new questions from Ed and office hours. If you do not see your question answered below, be sure to ask!
Are there any sample projects?
You can some examples of successful final projects here. Note that these projects are a different format than this year’s project.
Can the memo be longer than 3 pages?
You’re welcome to include an appendix of additional relevant results, but we can’t guarantee that the teaching team will review anything beyond 3 pages. Please make sure not to just dump all of your extraneous findings and plots in an appendix unless there is a good reason to include them. While research papers often have just 3-5 main plots, authors will often produce hundreds of plots over the course of a project that the public never sees.
Can presentations be longer than 7 minutes?
No. The member of the teaching staff reviewing your presentation will stop watching at 7 minutes. Practice your presentation several times to ensure that you stay within the time limit.
How should I record the presentation?
You should record yourself in front of a large screen (e.g., a flat screen TV in a conference room at Stanford) with your slides/plots/diagrams visible in the same frame as your face. Do not read off of a script, do not present via screen share, and do not keep your camera off. In other words, treat the presentation as though you are presenting to work colleagues live and in-person.
Your key plots should be the main attraction of your presentation. All plot elements should be clearly visible. You are welcome to include additional information aside from just your plots. Slides with a lot of text should be avoided.
Each team member should present in front of the same screen. If it impossible for you to coordinate with your team members to be in the same location at the same time to record the presentation, please let your project mentor know ahead of time. In these cases, you should each present separately in front of different screens and splice together your videos.
If you are unable to access a space on or off campus with a large screen, or there is something else preventing you from presenting in the format described above, please get in touch with your project mentor about finding an alternative solution.
Can reports be double spaced?
No. The report should be single-spaced with a reasonably sized font and standard margins. Keep in mind that your 3-5 figures will take up a lot of space in your memo, and thoughtfully-designed plots+captions are arguably more important than the main text.
Can we collect our own data?
Yes! Many past students have used surveys to answer their research questions.
If you plan to create a survey, be sure to receive approval of your survey from the course staff before publishing it. You’ll also want to publish your survey well before the project deadline.
Effective survey design can take much longer than expected, so it’s not a good option for a last-minute project!
Is there a rough outline of what you’re looking for in the memo and presentation?
As mentioned above, the project is intentionally open-ended and doesn’t have a fixed rubric. That being said, here is a sample outline that has worked for many projects in the past. Keep in mind that your outline may differ!
Introduction and motivation
-
What is/are your research question(s)?
-
Why is each question interesting?
-
What’s your hypothesis?
-
What’s the brief summary of your results?
Relevant work
-
Who else has tried to answer your question(s)?
-
Were they successful?
-
How does your project relate to or build on existing work?
-
You should be able to recycle a lot of your proposal submission in this section!
Data and methods
-
How will you go about answering your research question(s)?
-
What data sources will you use?
-
What methods will you use, and how will they answer your research question(s)?
-
For most groups, the ideal methods will be extensive exploratory data analysis, followed by linear and/or a logistic regression(s). If this is the case for your project, you would use this section to explain how your plots and regression(s) will answer your research questions.
Results and discussion
-
What are your findings?
-
How do you interpret those findings?
-
This will probably be your longest section, so spend the most time here.
-
You shouldn’t have more than 3-5 total plots+figures in your memo. The presentation should also include no more than 3-5 plots. Only include plots if they are critically relevant to your story.
-
Spend sufficient time making your plots pretty! With tools like ChatGPT, it’s a lot easier to figure out how to clean up plots. Aim to make your plots as clean as a plot you might see in a professional news outlet. You’re welcome to discuss potential improvements to your plots during office hours.
Conclusion
-
To what extent did you answer your research question?
-
Was your hypothesis correct?
-
With infinite time and resources, how would you go about better answering your research question(s)?