📝 Assignment 2
Due date: Friday, April 19 at 5pm Pacific.
⏳ We recommend attempting each problem ASAP so you can accurately estimate the time needed to complete the assignment.
- This is not an assignment to start the night before the due date!
- Remember that MS&E 125 is a 4-unit course. For the median student, this is supposed to translate to 3 weekly hours of lecture and 9 weekly hours of working on assignments and studying.
Unless otherwise stated, assignments are to be done individually. You are welcome to work with others to master the principles and approaches used to solve the homework problems, but the work you turn in should be your own.
Unlike HW1, the collaboration policies on HW2 are not relaxed beyond what is stated above. Please do not share answers directly with other students.
This assignment has not been seen by a previous cohort of MS&E 125 students, so there may be some unforeseen hiccups. If anything seems confusing or unclear, please create an Ed post.
We will use this Ed post to track errors and clarifications on HW2.
📮 Submission
Submit your assignment via Gradescope. Make sure to tag your answers properly on Gradescope, or else you may be docked points.
-
For the math problems, prepare a photo of your handwritten answers to each problem, and convert the photo to PDF.
-
For the Google Colab submission, first run all of your cells using the
Run all
command in theRuntime
menu. Then, download your completed Google Colab notebook as an.ipynb
file. Finally, use this website to convert your.ipynb
file to.pdf
format. Proofread the PDF to make sure all of your answers and plots are visible and not cut off. If your PDF is really long, it is possible that your code is printing out the entire dataset or a really long vector. Please make sure to comment out any code that prints more information than each question asks you for. -
Issues converting to
.pdf
? Make sure there are no error messages in the outputs after you run all cells. Please do not use any special characters in the filename of the.ipynb
file that you upload. -
For the screencast feedback, submit a PDF of a text file containing your feedback. Additionally, submit your feedback using this Google Form, which will allow us to quickly send your feedback to the student who recorded the screencast.
Finally, concatenate the three PDFs above using a tool of your choice. For example, you could use this website.
🎲 Deriving standard errors and confidence intervals (20% of the assignment grade)
For each of the estimators below, describe a hypothetical industry scenario where you could use the estimator to learn something interesting about your product, clients, and/or customers.
Then, using the methods we discussed in class, derive the standard error of each estimator, and a formula for a 95% confidence interval for each estimator.
Hint: All of the derivations proceed almost identically to the derivations for the coin flip estimator from class.
1. Suppose \(X_i \overset{\mathrm{iid}}{\sim} \text{Bernoulli}(p_x)\) and \(Y_i \overset{\mathrm{iid}}{\sim} \text{Bernoulli}(p_y)\), where \(p_x\) and \(p_y\) are unknown but fixed values.
Let $$\hat{p}_x - \hat{p}_y = \frac{1}{n_x} \sum_{i=1}^{n_x} X_i - \frac{1}{n_y} \sum_{i=1}^{n_y} Y_i.$$
We will use \(\hat{p}_x - \hat{p}_y\) as the estimator for \(p_x - p_y\).
2. Suppose \(X_i \overset{\mathrm{iid}}{\sim} N(\mu, \sigma^2)\), where \(\sigma^2\) is a known and fixed value, but \(\mu\) is an unknown but fixed value.
Let $$\bar{x} = \frac{1}{n} \sum_{i=1}^n X_i.$$
We will use \(\bar{x}\) as the estimator for \(\mu\).
3. Suppose \(X_i \overset{\mathrm{iid}}{\sim} N(\mu_x, \sigma_x^2)\) and \(Y_i \overset{\mathrm{iid}}{\sim} N(\mu_y, \sigma_y^2)\), where \(\sigma_x^2\) and \(\sigma_y^2\) are known and fixed values, but \(\mu_x\) and \(\mu_y\) are unknown but fixed values.
Let $$\bar{x} - \bar{y} = \frac{1}{n_x} \sum_{i=1}^{n_x} X_i - \frac{1}{n_y} \sum_{i=1}^{n_y} Y_i.$$
We will use \(\bar{x} - \bar{y}\) as the estimator for \(\mu_x - \mu_y\).
Why complete this problem? Along with the coin-flip estimator we derived in lecture, these three estimators are the backbone of basic statistical inference in industry, medicine, and academia. Estimators (2) and (3) are slightly different in practice, since the population variance is typically an unknown quantity that we have to estimate with the sample variance.
🗣️ Feedback on another student’s screencast (20%)
You will be emailed the link to another student’s screencast from HW1.
- If you have not received an email with a link by Tuesday, April 16, please open a private Ed post.
For this exercise, you will write detailed feedback on the screencast emailed to you.
-
Keep in mind that this feedback will be anonymously provided to the student who recorded the screencast.
-
Please be constructive and supportive with any criticism, and do not hold back on providing praise where deserved!
As you write your feedback, you may want to consider the prompts below. However, do not feel limited to just these prompts, and do not feel compelled to address every single prompt.
- What did you enjoy most about the presentation?
- What insights did you find particularly interesting?
- Did the presenter follow the key three tips of describing the x-axis, describing the y-axis, and explaining a plot feature (e.g., a point or line) in context, before diving into the details?
- Could the presenter have done anything to help you understand the plot more easily? Were you confused at any point?
- Did you find the tone of the presentation engaging? Did it sound like the presenter had practiced their presentation, or that they spent time thoughtfully writing a script for the presentation?
- Did the presenter sufficiently describe the contents of the plot?
- Did you have enough background information to understand the plot? Could the presentation have benefited from any more background information?
- Did the presenter describe the key takeaways of the plot? In other words, did the presenter explain why the plot actually matters in a real life context, as opposed to just explain how to read the plot?
- Did the presenter provide any extraneous information or “over-describe” anything? In other words, could the presenter have shortened any parts of the presentation without harming its key takeaways?
- Would any parts of the presentation benefit from more description or detail? Did anything feel rushed?
- Do you have any follow up questions for the presentation? For example, do you see any natural extensions of what was presented, or have any ideas of what could go in a follow up presentation on the same topic? Answering this question will help your presenter think about potential project ideas, and give you a chance to exercise your creative thinking skills.
- Do you have any “nits” about the presentation (i.e., very small changes that could improve the presentation, like a typo or mispronunciation)? If you choose to answer this prompt, it should not take up more than 10% of the text of your entire feedback. Focus your energy on the “big picture” prompts.
Your feedback will be graded based on demonstrated effort and thoughtfulness.
-
You should aim to write a couple paragraphs of feedback.
-
Your feedback can alternatively be written as an organized, bulleted list equivalent in word count to a couple paragraphs.
Why complete this problem? Writing detailed feedback on another student’s plot presentation will help you become a better presenter. One of the hardest skills to develop is “presentation empathy”, or a sense of how someone who has never seen your presentation before will interpret your presentation. After staring at your own slides or script for hours, it can be hard to see your work with “fresh eyes”. If you are at all surprised by the feedback you receive on your own screencast (or receive feedback with which you disagree!), take that moment as an excellent opportunity to understand how other people can interpret your work differently than you interpret your own work. Remember, it is not the audience’s responsibility to decipher your presentation for its intended interpretation. You need to carefully prepare your presentation so that the intended interpretation is crystal clear!
🍬 🪖 Lab: Introduction to inference with M&Ms and helmets (60%)
Complete the HW2 Lab Notebook.
The helmet data for the class can be found at this link.
-
Go to the
View
menu, pressCollapse sections
, and then pressExpand sections
to automatically unhide all of the sections of the homework. -
⏳ This is the most time-consuming component of the assignment, so get started ASAP (and, if needed, get help early!).
Why complete this problem? Confidence intervals inform decisions across industries and fields. After completing this notebook, you will have a foundational understanding of the statistical theory behind normally-approximated confidence intervals, and how to construct them with R
. You will also be able to thoroughly question the assumptions that are required for constructing valid confidence intervals, and think through the downstream consequences of violating those assumptions.