2025-09-18
We will be using the murders
dataset in the dslabs package.
Includes data on 2010 gun murders for the US 50 states and DC.
We will use it to answer questions such as “What is the state with lowest crime rate in the Western part of the US?”
First, some simple examples of vectorization.
Let’s convert the following heights in inches to meters:
We can subtract a constant from each element of a vector.
This is convenient for computing residuals or deviations from an average:
[1] 0.08995503 -2.00899575 -0.80959530 0.38980515 0.38980515 1.28935548
[7] -0.50974519 1.28935548 -0.50974519 0.38980515
scale
, that does this. We describe it soon.If we operate on two vectors, vectorization is componentwise.
Here is an example:
Add a column to the murders dataset with the murder rate.
Use murders per 100,000 persons as the unit.
[,1]
[1,] 0.08995503
[2,] -2.00899575
[3,] -0.80959530
[4,] 0.38980515
[5,] 0.38980515
[6,] 1.28935548
[7,] -0.50974519
[8,] 1.28935548
[9,] -0.50974519
[10,] 0.38980515
attr(,"scaled:center")
[1] 68.7
attr(,"scaled:scale")
[1] 3.335
provides the same results,
scale
coerces to a column matrix:Functions such as any
and all
, covert vectors to logicals of length one needed for if
-else
.
A particularly useful function is a vectorized version ifelse
.
Here is an example:
TRUE
valuessplit
split
is a useful function to get indexes using a factor:The functions which
, match
and the operator %in%
are useful for sub-setting
To understand how they work it’s best to use examples.
[1] 10 33 44
state abb region population total
10 Florida FL South 19687653 669
33 New York NY Northeast 19378102 517
44 Texas TX South 25145561 805
Note this is similar to using match.
But note the order is different.
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[26] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3 NA NA
[51] NA
intersect
, union
, setdiff
, setequal
sapply
- apply the function and simplify if possible
apply
- applies function to rows or columns of a matrix.
lapply
- returns a list. Convenient when the function returns something other than a number.
tapply
- can apply to subsets defined by second variable.
mapply
- multivector version of sapply
.
$Northeast
[1] 55317240
$South
[1] 115674434
$`North Central`
[1] 66927001
$West
[1] 71945553