Problem Set 2
Due by 11:59 PM Sunday September 6, 2020
ANSWERS:
Instructions
There are several ways you can complete and turn in this homework assignment:
Type up any applicable answers (saving any plots as images and including them) in a (e.g. Word) document and save it as a PDF and turn in a (commented!) .R file of commands for each relevant question.
If you wish to write out answers by hand, you may either print the pdf above or write your answers (all I need is your work and answers) on your own paper and then please scan/photograph & convert them to a single PDF, if they are easily readable, but this is not preferred. See my guide to making a PDF
Download the
.Rmd
file, do the homework in markdown, and email to me a singleknit
tedhtml
orpdf
file. Be sure that it shows all of your code (i.e. all chunks haveecho = TRUE
options), otherwise I will also ask for the markdown file.
To minimize confusion, I suggest creating a new R Project
(e.g. hw1
) and storing any data and plots in that folder on your computer. See my example workflow.
You may work together (and I highly encourage that) but you must turn in your own answers. I grade homeworks 70% for completion, and for the remaining 30%, pick one question to grade for accuracy - so it is best that you try every problem, even if you are unsure how to complete it accurately.
Theory and Concepts
Question 1
In your own words, explain the difference between endogeneity and exogeneity.
Question 2
Part A
In your own words, explain what (sample) standard deviation means.
Part B
In your own words, explain how (sample) standard deviation is calculated. You may also write the formula, but it is not necessary.
Problems
For the remaining questions, you may use R
to verify, but please calculate all sample statistics by hand and show all work.
Question 3
Suppose you have a very small class of four students that all take a quiz. Their scores are reported as follows:
\[\{83, 92, 72, 81\}\]
Part A
Calculate the median.
Part B
Calculate the sample mean, \(\bar{x}\).
Part C
Calculate the sample standard deviation, \(s\).
Part D
Make or sketch a rough histogram of this data, with the size of each bin being 10 (i.e. 70’s, 80’s, 90’s, 100’s). You can draw this by hand or use R
.If you are using ggplot
, you want to use +geom_histogram(breaks=seq(start,end,by))
and add +scale_x_continuous(breaks=seq(start,end,by))
. For each, it creates bins in the histogram, and ticks on the x axis by creating a seq
uence starting at start
(a number), ending at end
(number), by
a certain interval (i.e. by 10
s.).
Is this distribution roughly symmetric or skewed? What would we expect about the mean and the median?
Part E
Suppose instead the person who got the 72 did not show up that day to class, and got a 0 instead. Recalculate the mean and median. What happened and why?
Question 4
Suppose the probabilities of a visitor to Amazon’s website buying 0, 1, or 2 books are 0.2, 0.4, and 0.4 respectively.
Part A
Calculate the expected number of books a visitor will purchase.
Part B
Calculate the standard deviation of book purchases.
Part C
Bonus: try doing this in R
by making an initial dataframe of the data, and then making new columns to the “table” like we did in class.
Question 5
Scores on the SAT (out of 1600) are approximately normally distributed with a mean of 500 and standard deviation of 100.
Part A
What is the probability of getting a score between a 400 and a 600?
Part B
What is the probability of getting a score between a 300 and a 700?
Part C
What is the probability of getting at least a 700?
Part D
What is the probability of getting at most a 700?
Part E
What is the probability of getting exactly a 500?
Question 6
Redo problem 5 by using the pnorm()
command in R
.Hint: This function has four arguments: 1. the value of the random variable, 2. the mean of the distribution, 3. the sd of the distribution, and 4. lower.tail
TRUE
or FALSE
.