2024-09-18
We will be using the murders
dataset in the dslabs package.
Includes data on 2010 gun murders for the US 50 states and DC.
We will use it to answer questions such as “What is the state with lowest crime rate in the Western part of the US?”
First, some simple examples of vectorization.
Let’s convert the following heights in inches to meters:
We can subtract a constant from each element of a vector.
This is convenient for computing residuals or deviations from an average:
[1] 0.08995503 -2.00899575 -0.80959530 0.38980515 0.38980515 1.28935548
[7] -0.50974519 1.28935548 -0.50974519 0.38980515
scale
, that does this. We describe it soon.If we operate on two vectors, vectorization is componentwise.
Here is an example:
Add a column to the murders dataset with the murder rate.
Use murders per 100,000 persons as the unit.
[,1]
[1,] 0.08995503
[2,] -2.00899575
[3,] -0.80959530
[4,] 0.38980515
[5,] 0.38980515
[6,] 1.28935548
[7,] -0.50974519
[8,] 1.28935548
[9,] -0.50974519
[10,] 0.38980515
attr(,"scaled:center")
[1] 68.7
attr(,"scaled:scale")
[1] 3.335
provides the same results,
scale
coerces to a column matrix:The conditional function if
-else
does not vectorize.
Functions such as any
and all
, covert vectors to logicals of lenght one needed for if
-else
.
A particularly useful function is a vectorized version ifelse
.
Here is an example:
split
split
is a useful function to get indexes using a factor:The functions which
, match
and the operator %in%
are useful for sub-setting
To understand how they work it’s best to use examples.
[1] 10 33 44
state abb region population total
10 Florida FL South 19687653 669
33 New York NY Northeast 19378102 517
44 Texas TX South 25145561 805
Note this is similar to using match.
But note the order is different.
The apply functions let use the concept of vectorization with functions that don’t vectorize.
Here is an example of a function that won’t vectorize in a convenient way:
sapply
, one of the apply functions:sapply
will work on any vector, including lists.There are other apply functions:
lapply
- returns a list. Convenient when the function returns something other than a number.
tapply
- can apply to subsets defined by second variable.
mapply
- multivariate version of sapply
.
apply
- applies function to rows or columns o matrix.
We will learn some of these as we go.