Problem set 2

Published

September 19, 2024

For these exercises, do not load any packages other than dslabs.

Make sure to use vectorization whenever possible.

What is the sum of the first 100 positive integers? Use the functions seq and sum to compute the sum with R for any n.

# Your code here

Load the US murders dataset from the dslabs package. Use the function str to examine the structure of the murders object. What are the column names used by the data frame for these five variables? Show the subset of murders showing states with less than 1 per 100,000 deaths. Show all variables.

library(dslabs)
str(murders)

'data.frame':   51 obs. of  5 variables:
 $ state     : chr  "Alabama" "Alaska" "Arizona" "Arkansas" ...
 $ abb       : chr  "AL" "AK" "AZ" "AR" ...
 $ region    : Factor w/ 4 levels "Northeast","South",..: 2 4 4 2 4 4 1 2 2 2 ...
 $ population: num  4779736 710231 6392017 2915918 37253956 ...
 $ total     : num  135 19 232 93 1257 ...

# Your code here

Show the subset of murders showing states with less than 1 per 100,000 deaths and in the West of the US. Don’t show the region variable.

# Your code here

Show the largest state with a rate less than 1 per 100,000.

# Your code here

Show the state with a population of more than 10 million with the lowest rate.

# Your code here

Compute the rate for each region of the US.

# Your code here

Create a vector of numbers that starts at 6, does not pass 55, and adds numbers in increments of 4/7: 6, 6 + 4/7, 6 + 8/7, and so on. How many numbers does the list have? Hint: use seq and length.

# Your code here

Make this data frame:

temp <- c(35, 88, 42, 84, 81, 30)
city <- c("Beijing", "Lagos", "Paris", "Rio de Janeiro", 
          "San Juan", "Toronto")
city_temps <- data.frame(name = city, temperature = temp)

Convert the temperatures to Celsius.

# Your code here

Write a function euler that compute the following sum for any \(n\):

\[ S_n = 1+1/2^2 + 1/3^2 + \dots 1/n^2 \]

# Your code here

Show that as \(n\) gets bigger we get closer \(\pi^2/6\) by plotting \(S_n\) versus \(n\) with a horizontal dashed line at \(\pi^2/6\).

# Your code here

Use the %in% operator and the predefined object state.abb to create a logical vector that answers the question: which of the following are actual abbreviations: MA, ME, MI, MO, MU?

# Your code here

Extend the code you used in the previous exercise to report the one entry that is not an actual abbreviation. Hint: use the ! operator, which turns FALSE into TRUE and vice versa, then which to obtain an index.

# Your code here

In the murders dataset, use %in% to show all variables for New York, California, and Texas, in that order.

# Your code here

Write a function called vandermonde_helper that for any \(x\) and \(n\), returns the vector \((1, x, x^2, x^3, \dots, x^n)\). Show the results for \(x=3\) and \(n=5\).

# Your code here

Create a vector using:

n <- 10000
p <- 0.5
set.seed(2024-9-6)
x <- sample(c(0,1), n, prob = c(1 - p, p), replace = TRUE)

Compute the length of each stretch of 1s and then plot the distribution of these values. Check to see if the distribution follows a geometric distribution as the theory predicts. Do not use a loop!

# Your code here