Introduction

Rafael A. Irizarry

2024-12-02

Machine Learning

Machine learning has achieved remarkable successes in a variety of applications.
These range from the postal service’s use of machine learning for reading handwritten zip codes to the development of voice recognition systems.

Machine Learning

Other significant advances include movie recommendation systems, spam and malware detection, housing price prediction algorithms, and the ongoing development of autonomous vehicles.
The field of Artificial Intelligence (AI) has been evolving for several decades.

Machine Learning

Traditional AI systems, including some chess-playing machines, often relied on decision-making based on preset rules and knowledge representation.
However, with the advent of data availability, machine learning has gained prominence.
It focuses on decision-making through algorithms trained with data.
In recent years, the terms AI and Machine Learning have been used interchangeably in many contexts, though they have distinct meanings.

Machine Learning

Machine learning has achieved remarkable successes, ranging from the postal service’s handwritten zip code readers to voice recognition systems like Apple’s Siri.
These advances also include movie recommendation systems, spam and malware detection, housing price prediction algorithms, and the development of driverless cars.

Terminology

Outcome - what we want to predict
Features - what we use to predict the outcome.
Algorithms that take feature values as input and returns a prediction for the outcome.
We train an algorithm using a dataset for which we do know the outcome, and then apply algorithm when we don’t know the outcome.

Terminology

Prediction problems can be divided into categorical and continuous outcomes.

Categorical

The number of classes can vary greatly across applications.
We denote the \(K\) categories with indexes \(k=1,\dots,K\).
However, for binary data we will use \(k=0,1\) for mathematical conveniences that we demonstrate later.

Continuous

Examples of outcomes include:

stock prices
realestate prices
temperature next week
student perforamnce

Notation

We use \(y_i\) to denote the i-th outcome
\(x_{i,1}, \dots, x_{i,p}\) the corresponding features.
Also referred to as predictors or covariates.
We use matrix notation \(\mathbf{x}_i = (x_{i,1}, \dots, x_{i,p})^\top\) to denote the vector of predictors.

Notation

Because, we often use statistical models to motivate algorithms we also use capital letters:

\[ Y \mbox{ and } \mathbf{X} = (X_{1}, \dots, X_{p}) \]

Note we drop the index \(i\) because it represents the random variable that generates observations.
We use lower case, for example \(\mathbf{X} = \mathbf{x}\), to denote observed values.

Notation

The machine learning task is to build an algorithm that returns a prediction for any of the possible values of the features:

\[ \hat{y} = f(x_1,x_2,\dots,x_p) \]

We will learn several approaches to building these algorithms.

The machine learning challenge

The general setup is as follows.
We have a series of features and an unknown outcome we want to predict:

outcome	feature 1	feature 2	feature 3	\(\dots\)	feature p
?	\(X_1\)	\(X_2\)	\(X_3\)	\(\dots\)	\(X_p\)

The machine learning challenge

To build a model that provides a prediction for any set of observed values \(X_1=x_1, X_2=x_2, \dots X_p=x_p\), we collect data for which we know the outcome:

outcome	feature 1	feature 2	feature 3	\(\dots\)	feature 5
\(y_{1}\)	\(x_{1,1}\)	\(x_{1,2}\)	\(x_{1,3}\)	\(\dots\)	\(x_{1,p}\)
\(y_{2}\)	\(x_{2,1}\)	\(x_{2,2}\)	\(x_{2,3}\)	\(\dots\)	\(x_{2,p}\)
\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\vdots\)	\(\ddots\)	\(\vdots\)
\(y_n\)	\(x_{n,1}\)	\(x_{n,2}\)	\(x_{n,3}\)	\(\dots\)	\(x_{n,p}\)

The machine learning challenge

When the output is continuous, we refer to the ML task as prediction.
We use the term actual outcome \(y\) to denote what we end up observing.
We want the prediction \(\hat{y}\) to match the actual outcome \(y\) as best as possible.
We define error as the difference between the prediction and the actual outcome \(y - \hat{y}\).

The machine learning challenge

When the outcome is categorical, we refer to the machine learning task as classification
The main output of the model will be a decision rule which prescribes which of the \(K\) classes we should predict.

The machine learning challenge

Most models provide functions for each class \(k\), \(f_k(x_1, x_2, \dots, x_p)\), that are used to make this decision such as

\[ \mbox{When } f_k(x_1, x_2, \dots, x_p) > C, \mbox{ predict category } k \]

Here predictions will be either right or wrong.