# Your code hereProblem set 2
For these exercises, do not load any packages other than dslabs.
Make sure to use vectorization whenever possible.
- What is the sum of the first 100 positive integers? Use the functions
seqandsumto compute the sum with R for anyn.
- Load the US murders dataset from the dslabs package. Use the function
strto examine the structure of themurdersobject. What are the column names used by the data frame for these five variables? Show the subset ofmurdersshowing states with less than 1 per 100,000 deaths. Show all variables.
library(dslabs)Warning: package 'dslabs' was built under R version 4.4.3
str(murders)'data.frame': 51 obs. of 5 variables:
$ state : chr "Alabama" "Alaska" "Arizona" "Arkansas" ...
$ abb : chr "AL" "AK" "AZ" "AR" ...
$ region : Factor w/ 4 levels "Northeast","South",..: 2 4 4 2 4 4 1 2 2 2 ...
$ population: num 4779736 710231 6392017 2915918 37253956 ...
$ total : num 135 19 232 93 1257 ...
# Your code here- Show the subset of
murdersshowing states with less than 1 per 100,000 deaths and in the West of the US. Don’t show theregionvariable.
# Your code here- Show the largest state with a rate less than 1 per 100,000.
# Your code here- Show the state with a population of more than 10 million with the lowest rate.
# Your code here- Compute the rate for each region of the US.
# Your code here- Create a vector of numbers that starts at 6, does not pass 55, and adds numbers in increments of 4/7: 6, 6 + 4/7, 6 + 8/7, and so on. How many numbers does the list have? Hint: use
seqandlength.
# Your code here- Make this data frame:
temp <- c(35, 88, 42, 84, 81, 30)
city <- c("Beijing", "Lagos", "Paris", "Rio de Janeiro",
"San Juan", "Toronto")
city_temps <- data.frame(name = city, temperature = temp)Convert the temperatures to Celsius.
# Your code here- Write a function
eulerthat compute the following sum for any \(n\):
\[ S_n = 1+1/2^2 + 1/3^2 + \dots 1/n^2 \]
# Your code here- Show that as \(n\) gets bigger we get closer \(\pi^2/6\) by plotting \(S_n\) versus \(n\) with a horizontal dashed line at \(\pi^2/6\).
# Your code here- Use the
%in%operator and the predefined objectstate.abbto create a logical vector that answers the question: which of the following are actual abbreviations: MA, ME, MI, MO, MU?
# Your code here- Extend the code you used in the previous exercise to report the one entry that is not an actual abbreviation. Hint: use the
!operator, which turnsFALSEintoTRUEand vice versa, thenwhichto obtain an index.
# Your code here- In the
murdersdataset, use%in%to show all variables for New York, California, and Texas, in that order.
# Your code here- Write a function called
vandermonde_helperthat for any \(x\) and \(n\), returns the vector \((1, x, x^2, x^3, \dots, x^n)\). Show the results for \(x=3\) and \(n=5\).
# Your code here- Create a vector using:
n <- 10000
p <- 0.5
set.seed(2024-9-6)
x <- sample(c(0,1), n, prob = c(1 - p, p), replace = TRUE)Compute the length of each stretch of 1s and then plot the distribution of these values. Check to see if the distribution follows a geometric distribution as the theory predicts. Do not use a loop!
# Your code here- In the
murdersdataset, create a logical vector that indicates which states have both a murder rate higher than the national average AND a population greater than 5 million. Then useifelseto create a character vector that labels these states as “High Crime, High Pop”, states with murder rate higher than average but population ≤ 5 million as “High Crime, Low Pop”, and all other states as “Lower Crime”.
# Your code here- Use
order,rank, andsortfunctions on the murder rates to answer the following: What is the murder rate of the state that ranks 10th in terms of murder rate? Show your work by creating the murder rate vector, then using the appropriate function to find the 10th ranked value.
# Your code here- Write a function called
compute_harmonic_meanthat takes a numeric vector and returns the harmonic mean (which is \(n / \sum_{i=1}^{n} 1/x_i\)). The function should returnNAif any values are zero or negative. Test your function on the vectorc(1, 2, 4, 8)and show that it returns approximately 2.13.
- Write a function called
# Your code here- Create a function called
safe_dividethat takes two argumentsxandyand returns their ratiox/y, but returns the string “Cannot divide by zero” whenyis zero. Use vectorization so that the function works element-wise on vectors. Test it on the vectorsx <- c(10, 20, 30)andy <- c(2, 0, 5).
# Your code here- Using the
murdersdataset, write a function calledclassify_state_safetythat takes a state name as input and returns a classification based on the murder rate: “Very Safe” (rate < 1), “Safe” (rate 1-3), “Moderate” (rate 3-5), or “High Risk” (rate > 5). If the state name is not found, return “State not found”. Test your function on “Vermont”, “Texas”, “California”, and “NotAState”. Then usesapplyto classify all states and create a table showing how many states fall into each safety category using thetablefunction.
# Your code here