📝 Assignment 5
Due date: Friday, May 17 at midnight Pacific time (i.e., 1 minute after 11:59pm).
Unless otherwise stated, assignments are to be done individually. You are welcome to work with others to master the principles and approaches used to solve the homework problems, but the work you turn in should be your own.
This assignment has not been seen by a previous cohort of MS&E 125 students, so there may be some unforeseen hiccups. If anything seems confusing or unclear, post on Ed.
We will use this Ed post to track errors and clarifications on HW5.
📮 Submission
Submit your assignment via Gradescope.
-
For the Google Colab submission, run all of your cells using the
Run all
command in theRuntime
menu. Then, download your completed Google Colab notebook as an.ipynb
file. Finally, use this website to convert your.ipynb
file to.pdf
format. -
Proofread the PDF to make sure all of your answers and plots are visible and not cut off. Missing answers will not receive credit, and cannot be submitted beyond the slip day deadline.
-
If your PDF is really long, it is possible that your code is printing out the entire dataset or a really long vector. Please make sure to comment out any code that prints more information than each question asks you for.
-
Issues converting to
.pdf
? Make sure there are no error messages in the outputs after you run all cells. Please do not use any special characters in the filename of the.ipynb
file that you upload.
Submit your Colab PDF to Gradescope. Make sure to tag your answers properly on Gradescope, or else you may be docked points.
The project check-in submission should be emailed directly to your project mentor.
- Only one team member has to send the email.
- Please CC all of your project team members.
- Unless you have been notified otherwise, your project mentor is the member of the teaching staff who sent you your project proposal feedback.
🫀 Lab: Heart disease prediction (70% of the assignment grade)
Complete the HW5 Lab Notebook
Why complete this problem? Binary classification is an extremely common problem with a variety of applications. Linear probability models and logistic regression are critical components of the data science toolkit. This lab will walk you through realistic examples of fitting and interpreting these models, and teach you the classification metrics needed to assess and compare model performance.
📈 Project Check-In (30% of the assignment grade)
Generate one or more plots related to your project’s research question.
- The plot(s) should be generated by your team, using public data that you have identified and imported.
- The plot(s) do not have to fully answer your research question, but should be relevant to your project’s overall theme, convey some nontrivial message about your data, and push your project forward.
To accompany your plot(s), write one extended paragraph that 1) briefly describes your plot and how it addresses your research question, and 2) describes the next steps for your project as they relate to your findings from your plot(s).
Note: If you are collecting your own data for the project using a survey, you should instead submit a final draft of the survey questions that you plan to ask, along with a paragraph that explains how the answers to each of your survey questions will address your main research question.
The project check-in submission should be emailed directly to your project mentor.
- Only one team member has to send the email.
- Please CC all of your project team members.
- Unless you have been notified otherwise, your project mentor is the member of the teaching staff who sent you your project proposal feedback.