Problem set 6

Published

October 26, 2025

This problem set explores probability through two classic scenarios: the birthday paradox and casino games. You’ll use both Monte Carlo simulation and exact mathematical calculations to understand these phenomena.

Please answer each of the exercises below. For those asking for a mathematical calculation please use LaTeX to show your work.

Important: Make sure that your document renders in less than 5 minutes.

Write a function called same_birthday that takes a number n as an argument, randomly generates n birthdays and returns TRUE if two or more birthdays are the same. You can assume nobody is born on February 29.

Hint: use the functions sample, duplicated, and any.

same_birthday <- function(n){ 
  ## Your code here
}

Suppose you are in a classroom with 50 people. If we assume this is a randomly selected group of 50 people, what is the chance that at least two people have the same birthday? Use a Monte Carlo simulation with $B=1,000$ trials based on the function same_birthday from the previous exercise.

B <- 10^3
## Your code here

Redo the previous exercise for several values of n to determine at what group size do the chances become greater than 50%. Set the seed at 1997.

set.seed(1997)
compute_prob <- function(n, B = 10^3){ 
 ## Your code here
} 
## Your code here

These probabilities can be computed exactly instead of relying on Monte Carlo approximations. We use the multiplication rule:

\[ \mbox{Pr}(n\mbox{ different birthdays}) = 1 \times \frac{364}{365}\times\frac{363}{365} \dots \frac{365-n + 1}{365} \]

Plot the probabilities you obtained using Monte Carlo as points and the exact probabilities with a red line.

Hint: use the function prod to compute the exact probabilities.

exact_prob <- function(n){ 
 ## Your code here
} 
## Your code here

Note that the points don’t quite match the red line. This is because our Monte Carlo simulation was based on only 1,000 iterations. Repeat exercise 2 but for n = 23 and try B <- seq(10, 250, 5)^2 number of iterations. Plot the estimated probability against sqrt(B). At what value of sqrt(B) do the estimates consistently stay within 0.005 of the exact probability? Add horizontal lines around the exact probability $\pm$ 0.005. Note this could take several seconds to run. Set the seed to 1998.

set.seed(1998)
B <- seq(10, 250, 5)^2
## Your code here

Repeat the comparison from question 4 (Monte Carlo points vs exact red line), but use your findings from question 5 to choose an appropriate number of iterations B so that the points practically fall on the red curve.

Hint: If the number of iterations you chose is too large, you will achieve the correct plot but your document might not render in less than five minutes.

n <- seq(1,60) 
## Your code here

In American Roulette, with 18 red slots, 18 black slots, and 2 green slots (0 and 00), what is the probability of landing on a green slot?

\[ \mbox{Derivation here} \]
The payout for winning on green is $17 dollars. This means that if you bet a dollar and it lands on green, you get $17. If it lands on red or black you lose your dollar. Create a sampling model to simulate the random variable $X$ representing the casino’s profit from a single $1 bet on green. Use the sample function.

## Your code here

Now create a random variable $S$ of the Casino’s total winnings if $n = 1,000$ people bet on green. Use Monte Carlo simulation with B=10,000 trials to estimate the probability that the Casino loses money.

n <- 1000
## Your code here

What is the expected value of $X$?

\[ \mbox{Your derivation here.} \]
What is the standard error of $X$?

\[ \mbox{Your derivation here.} \]
What is the expected value of $S$? Does the Monte Carlo simulation confirm this?

\[ \mbox{Your derivation here} \]

## Your code here

What is the standard error of $S$? Does the Monte Carlo simulation confirm this?

\[ \mbox{Your derivation here.} \]

## Your code here

Use data visualization to convince yourself that the distribution of $S$ is approximately normal. Make a histogram and a QQ-plot of standardized values of $S$. The QQ-plot should be on the identity line.

## Your code here

Notice that the normal approximation is slightly off for the tails of the distribution. What would make this better? Increasing the number of people playing $n$ or the number of Monte Carlo iterations $B$? Explain your reasoning.

Answer here
Now approximate the probability estimated using CLT. Does it agree with the Monte Carlo simulation?

\[ \mbox{Your derivation here.} \]

## Your code here

What is the minimum number of people $n$ who must bet on green for the Casino to reduce the probability of losing money to 1%? Check your answer with a Monte Carlo simulation.

\[ \mbox{Your derivation here.} \]

## Your code here