Problem Set 9

Published

November 23, 2025

Instructions

  • Limited packages: Use only base R functions and the dslabs package
  • No for-loops: Use vectorized operations and built-in functions (functions like apply are acceptable except where specifically noted)

  1. Create a 100×10 matrix of randomly generated normal numbers. Store the result in x.
set.seed(2025)
## your code here
  1. Apply the three R functions that return: (a) the dimensions of x, (b) the number of rows of x, and (c) the number of columns of x.
## your code here
  1. Add row-specific scalars to matrix x: add 1 to row 1, add 2 to row 2, and so on.
## your code here
  1. Add column-specific scalars to matrix x: add 1 to column 1, add 2 to column 2, and so on. Hint: Use sweep with FUN = "+".
## your code here
  1. Compute the mean of each row of x.
## your code here
  1. Compute the mean of each column of x.
## your code here
  1. Load the MNIST training data using dslabs. For each digit class (0-9), compute the proportion of pixels that are in the “grey area” (pixel values between 50 and 205, inclusive). Create a boxplot showing these proportions by digit class. Hint: Use logical operators and rowMeans.
## your code here
  1. Use the solve function to find the solution to this system of linear equations:

\[ \begin{align} x + 2y - 2z &= -15\\ 2x + y - 5z &= -21\\ x - 4y + z &= 18 \end{align} \]

## your code here
  1. Use matrix multiplication to compute the mean of each column of x and store the result as a single-row matrix. Hint: Create a 1×n matrix of weights (1/n, 1/n, …, 1/n) where n = nrow(x).
## your code here
  1. Use matrix multiplication and other matrix operations to compute the standard deviation of each column of x. Do not use sweep, apply, or any built-in standard deviation functions. Hint: Recall that \(\text{sd} = \sqrt{\frac{1}{n-1}\sum(x_i - \bar{x})^2}\).
## your code here
  1. Load the MNIST training data and create a matrix small_x containing only the first 100 observations. Use matrix multiplication to compute the correlation matrix between all pairs of these 100 observations. Hint: First standardize the rows, then use %*% with the transpose.
## your code here
  1. Using the matrix x from problem 1, create a new matrix where each entry is replaced by 1 if the original value is above the row mean, and 0 otherwise. Do this using only matrix operations and logical indexing (no loops or apply functions).
## your code here
  1. Compute the Euclidean distance between row 1 and row 50 of matrix x using matrix multiplication. Store the result in a variable called distance. Hint: Use crossprod() or matrix multiplication with transpose.
## your code here
  1. Create a 5×5 identity matrix using only the matrix() function and diag(). Store it in a variable called I5.
## your code here
  1. Using the MNIST data, find which pixel position (row and column in the 28×28 image grid) has the highest average intensity across all training images. Return both the row and column indices. Hint: Use which.max() and arrayInd().
## your code here
  1. Using the MNIST training data, take the first 100 observations and reshape them into a 100×28×28 multidimensional array. Then compute the per-pixel averages across the first dimension to get a 28×28 matrix of average pixel intensities. Note that it is recommended to apply here is needed for multidimensional arrays. You can visualize this average image using image(1:28, 1:28, pixel_averages[, 28:1]) to see what the “average” digit looks like across these 100 observations.
## your code here
  1. Select the first 20 observations labeled as “1” and the first 20 observations labeled as “0” from the MNIST training data. Compute three correlation matrices: (a) correlations within the 1’s group, (b) correlations within the 0’s group, and (c) correlations between the 1’s and 0’s groups. Compare the average within-group correlation to the average between-group correlation.
## your code here