same_birthday <- function(n){
## Your code here
} Problem set 6
This problem set explores probability through two classic scenarios: the birthday paradox and casino games. You’ll use both Monte Carlo simulation and exact mathematical calculations to understand these phenomena.
Please answer each of the exercises below. For those asking for a mathematical calculation please use LaTeX to show your work.
Important: Make sure that your document renders in less than 5 minutes.
Write a function called
same_birthdaythat takes a numbernas an argument, randomly generatesnbirthdays and returnsTRUEif two or more birthdays are the same. You can assume nobody is born on February 29.Hint: use the functions
sample,duplicated, andany.
- Suppose you are in a classroom with 50 people. If we assume this is a randomly selected group of 50 people, what is the chance that at least two people have the same birthday? Use a Monte Carlo simulation with \(B=1,000\) trials based on the function
same_birthdayfrom the previous exercise.
B <- 10^3
## Your code here- Redo the previous exercise for several values of
nto determine at what group size do the chances become greater than 50%. Set the seed at 1997.
set.seed(1997)
compute_prob <- function(n, B = 10^3){
## Your code here
}
## Your code hereThese probabilities can be computed exactly instead of relying on Monte Carlo approximations. We use the multiplication rule:
\[ \mbox{Pr}(n\mbox{ different birthdays}) = 1 \times \frac{364}{365}\times\frac{363}{365} \dots \frac{365-n + 1}{365} \]
Plot the probabilities you obtained using Monte Carlo as points and the exact probabilities with a red line.
Hint: use the function
prodto compute the exact probabilities.
exact_prob <- function(n){
## Your code here
}
## Your code here- Note that the points don’t quite match the red line. This is because our Monte Carlo simulation was based on only 1,000 iterations. Repeat exercise 2 but for
n = 23and tryB <- seq(10, 250, 5)^2number of iterations. Plot the estimated probability againstsqrt(B). At what value of sqrt(B) do the estimates consistently stay within 0.005 of the exact probability? Add horizontal lines around the exact probability \(\pm\) 0.005. Note this could take several seconds to run. Set the seed to 1998.
set.seed(1998)
B <- seq(10, 250, 5)^2
## Your code hereRepeat the comparison from question 4 (Monte Carlo points vs exact red line), but use your findings from question 5 to choose an appropriate number of iterations B so that the points practically fall on the red curve.
Hint: If the number of iterations you chose is too large, you will achieve the correct plot but your document might not render in less than five minutes.
n <- seq(1,60)
## Your code hereIn American Roulette, with 18 red slots, 18 black slots, and 2 green slots (0 and 00), what is the probability of landing on a green slot?
\[ \mbox{Derivation here} \]
The payout for winning on green is $17 dollars. This means that if you bet a dollar and it lands on green, you get $17. If it lands on red or black you lose your dollar. Create a sampling model to simulate the random variable \(X\) representing the casino’s profit from a single $1 bet on green. Use the
samplefunction.
## Your code here- Now create a random variable \(S\) of the Casino’s total winnings if \(n = 1,000\) people bet on green. Use Monte Carlo simulation with B=10,000 trials to estimate the probability that the Casino loses money.
n <- 1000
## Your code hereWhat is the expected value of \(X\)?
\[ \mbox{Your derivation here.} \]
What is the standard error of \(X\)?
\[ \mbox{Your derivation here.} \]
What is the expected value of \(S\)? Does the Monte Carlo simulation confirm this?
\[ \mbox{Your derivation here} \]
## Your code hereWhat is the standard error of \(S\)? Does the Monte Carlo simulation confirm this?
\[ \mbox{Your derivation here.} \]
## Your code here- Use data visualization to convince yourself that the distribution of \(S\) is approximately normal. Make a histogram and a QQ-plot of standardized values of \(S\). The QQ-plot should be on the identity line.
## Your code hereNotice that the normal approximation is slightly off for the tails of the distribution. What would make this better? Increasing the number of people playing \(n\) or the number of Monte Carlo iterations \(B\)? Explain your reasoning.
Answer here
Now approximate the probability estimated using CLT. Does it agree with the Monte Carlo simulation?
\[ \mbox{Your derivation here.} \]
## Your code hereWhat is the minimum number of people \(n\) who must bet on green for the Casino to reduce the probability of losing money to 1%? Check your answer with a Monte Carlo simulation.
\[ \mbox{Your derivation here.} \]
## Your code here