Linear algebra is the main mathematical technique used to describe and motivate statistical methods and machine learning approaches.
We introduce some of the mathematical concepts needed to understand these techniques and demonstrate how to work with matrices in R.
To learn the mathematical details of statistical and ML theory you will need to learn linear algebra in more detail.
A commonly used operation in data analysis is matrix multiplication.
Linear algebra originated from mathematicians developing systematic ways to solve systems of linear equations.
\[ \begin{aligned} x_1 + x_2 + x_3 + x_4 + x_5 &= 15 \\ 2x_1 - x_2 + x_3 - x_4 + x_5 &= 10 \\ -x_1 + 3x_2 - 2x_3 + x_4 - x_5 &= -5 \\ x_1 + 4x_2 + x_3 + 2x_4 + 3x_5 &= 34 \\ 3x_1 - 2x_2 + x_3 - x_4 + 2x_5 &= 20 \end{aligned} \]
\[ \mathbf{X} = \begin{bmatrix} x_{1,1}&x_{1,2}&\dots & x_{1,p}\\ x_{2,1}&x_{2,2}&\dots & x_{2,p}\\ \vdots & \vdots & \ddots & \vdots\\ x_{n,1}&x_{n,2}&\dots&x_{n,p}\\ \end{bmatrix} \]
function.\[ \mathbf{x} = \begin{bmatrix} x_1\\\ x_2\\\ \vdots\\\ x_p \end{bmatrix} \]
\[ \mathbf{x}_i = \begin{bmatrix} x_{i,1}\\ x_{i,2}\\ \vdots\\ x_{i,p} \end{bmatrix} \]
Bold lower case letters are also commonly used to represent matrix columns rather than rows.
This can be confusing because \(\mathbf{x}_1\) can represent either the first row or the first column of \(\mathbf{X}\).
One way to distinguish is to use computer code, with \(:\) representint all: \(\mathbf{X}_{1,:}\) represents the first row and \(\mathbf{X}_{:,1}\) is the first column.
Another approach is to distinguish by the letter used to index, with \(i\) used for rows and \(j\) used for columns. So \(\mathbf{x}_i\) is \(i\)th row and \(\mathbf{x}_j\) is \(j\)th column.
With this approach, it is important to clarify which dimension, row or column is being represented.
Further confusion can arise because it is common to represent all vectors, including the rows of a matrix, as one-column matrices.
Mathematicians figured out that by representing linear systems of equations using matrices and vectors, predefined algorithms could be designed to solve any system of linear equations.
A basic linear algebra class will teach some of these algorithms, such as Gaussian elimination, the Gauss-Jordan elimination, and the LU and QR decompositions.
These methods are usually covered in detail in university level linear algebra courses.
\[ \mathbf{A} = \begin{pmatrix} a_{11}&a_{12}&\dots&a_{1n}\\ a_{21}&a_{22}&\dots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{m1}&a_{2}&\dots&a_{mn} \end{pmatrix}, \, \mathbf{B} = \begin{pmatrix} b_{11}&b_{12}&\dots&b_{1p}\\ b_{21}&b_{22}&\dots&b_{2p}\\ \vdots&\vdots&\ddots&\vdots\\ b_{n1}&b_{n2}&\dots&b_{np} \end{pmatrix} \]
for matrix multiplication:\[ \tiny \begin{pmatrix} a_{11}b_{11} + \dots + a_{1n}b_{n1}& a_{11}b_{12} + \dots + a_{1n}b_{n2}& \dots& a_{11}b_{1p} + \dots + a_{1n}b_{np}\\ a_{21}b_{11} + \dots + a_{2n}b_{n1}& a_{21}b_{n2} + \dots + a_{2n}b_{n2}& \dots& a_{21}b_{1p} + \dots + a_{2n}b_{np}\\ \vdots&\vdots&\ddots&\vdots\\ a_{m1}b_{11} + \dots +a_{mn}b_{n1}& a_{m1}b_{n2} + \dots + a_{mn}b_{n2}& \dots& a_{m1}b_{1p} + \dots + a_{mn}b_{np}\\ \end{pmatrix} \]
So how does this definition of matrix multiplication help solve systems of equations?
Any system of equations
\[ \begin{aligned} a_{11} x_1 + a_{12} x_2 \dots + a_{1n}x_n &= b_1\\ a_{21} x_1 + a_{22} x_2 \dots + a_{2n}x_n &= b_2\\ \vdots\\ a_{n1} x_1 + a_{n2} x_2 \dots + a_{nn}x_n &= b_n\\ \end{aligned} \]
\[ \mathbf{A} =\begin{pmatrix} a_{11}&a_{12}&\dots&a_{1n}\\ a_{21}&a_{22}&\dots&a_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{m1}&a_{22}&\dots&a_{nn} \end{pmatrix} ,\, \mathbf{b} = \begin{pmatrix} b_1\\ b_2\\ \vdots\\ b_n \end{pmatrix} ,\, \mbox{ and } \mathbf{x} = \begin{pmatrix} x_1\\ x_2\\ \vdots\\ x_n \end{pmatrix} \]
\[ \mathbf{A}\mathbf{x} = \mathbf{b} \]
\[ \mathbf{A}^{-1}\mathbf{A}\mathbf{x} = \mathbf{x} = \mathbf{A}^{-1} \mathbf{b} \]
The function solve
works well when dealing with small to medium-sized matrices with a similar range for each column and not too many 0s.
The function qr.solve
can be used when this is not the case.
\[ \mathbf{I}\mathbf{X} = \mathbf{X} \]
\[ \mathbf{I}=\begin{pmatrix} 1&0&\dots&0\\ 0&1&\dots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\dots&1 \end{pmatrix} \]
\[ \mathbf{A}^{-1}\mathbf{A} = \mathbf{1} \]
Because the default for the second argument in solve
is an identity matrix, if we simply type solve(A)
, we obtain the inverse \(\mathbf{A}^{-1}\).
This means we can also obtain a solution to our system of equations with:
A common operation when working with matrices is the transpose.
We use the transpose to understand several concepts, such as distance, using matrix notation.
This operation simply converts the rows of a matrix into columns.
We use the symbols \(\top\) or \('\) next to the bold upper case letter to denote the transpose:
\[ \tiny \text{if } \, \mathbf{X} = \begin{bmatrix} x_{1,1}&\dots & x_{1,p} \\ x_{2,1}&\dots & x_{2,p} \\ \vdots & \ddots & \vdots & \\ x_{n,1}&\dots & x_{n,p} \end{bmatrix} \text{ then }\, \mathbf{X}^\top = \begin{bmatrix} x_{1,1}&x_{2,1}&\dots & x_{n,1} \\ \vdots & \vdots & \ddots & \vdots \\ x_{1,p}&x_{2,p}&\dots & x_{n,p} \end{bmatrix} \]
.\[ \mathbf{X} = \begin{bmatrix} \mathbf{x}_1^\top\\ \mathbf{x}_2^\top\\ \vdots\\ \mathbf{x}_n^\top \end{bmatrix} \]