outcome | feature 1 | feature 2 | feature 3 | \(\dots\) | feature p |
---|---|---|---|---|---|
? | \(X_1\) | \(X_2\) | \(X_3\) | \(\dots\) | \(X_p\) |
2024-12-02
Machine learning has achieved remarkable successes in a variety of applications.
These range from the postal service’s use of machine learning for reading handwritten zip codes to the development of voice recognition systems.
Other significant advances include movie recommendation systems, spam and malware detection, housing price prediction algorithms, and the ongoing development of autonomous vehicles.
The field of Artificial Intelligence (AI) has been evolving for several decades.
Traditional AI systems, including some chess-playing machines, often relied on decision-making based on preset rules and knowledge representation.
However, with the advent of data availability, machine learning has gained prominence.
It focuses on decision-making through algorithms trained with data.
In recent years, the terms AI and Machine Learning have been used interchangeably in many contexts, though they have distinct meanings.
Machine learning has achieved remarkable successes, ranging from the postal service’s handwritten zip code readers to voice recognition systems like Apple’s Siri.
These advances also include movie recommendation systems, spam and malware detection, housing price prediction algorithms, and the development of driverless cars.
Outcome - what we want to predict
Features - what we use to predict the outcome.
Algorithms that take feature values as input and returns a prediction for the outcome.
We train an algorithm using a dataset for which we do know the outcome, and then apply algorithm when we don’t know the outcome.
The number of classes can vary greatly across applications.
We denote the \(K\) categories with indexes \(k=1,\dots,K\).
However, for binary data we will use \(k=0,1\) for mathematical conveniences that we demonstrate later.
Examples of outcomes include:
We use \(y_i\) to denote the i-th outcome
\(x_{i,1}, \dots, x_{i,p}\) the corresponding features.
Also referred to as predictors or covariates.
We use matrix notation \(\mathbf{x}_i = (x_{i,1}, \dots, x_{i,p})^\top\) to denote the vector of predictors.
\[ Y \mbox{ and } \mathbf{X} = (X_{1}, \dots, X_{p}) \]
Note we drop the index \(i\) because it represents the random variable that generates observations.
We use lower case, for example \(\mathbf{X} = \mathbf{x}\), to denote observed values.
\[ \hat{y} = f(x_1,x_2,\dots,x_p) \]
The general setup is as follows.
We have a series of features and an unknown outcome we want to predict:
outcome | feature 1 | feature 2 | feature 3 | \(\dots\) | feature p |
---|---|---|---|---|---|
? | \(X_1\) | \(X_2\) | \(X_3\) | \(\dots\) | \(X_p\) |
outcome | feature 1 | feature 2 | feature 3 | \(\dots\) | feature 5 |
---|---|---|---|---|---|
\(y_{1}\) | \(x_{1,1}\) | \(x_{1,2}\) | \(x_{1,3}\) | \(\dots\) | \(x_{1,p}\) |
\(y_{2}\) | \(x_{2,1}\) | \(x_{2,2}\) | \(x_{2,3}\) | \(\dots\) | \(x_{2,p}\) |
\(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\ddots\) | \(\vdots\) |
\(y_n\) | \(x_{n,1}\) | \(x_{n,2}\) | \(x_{n,3}\) | \(\dots\) | \(x_{n,p}\) |
When the output is continuous, we refer to the ML task as prediction.
We use the term actual outcome \(y\) to denote what we end up observing.
We want the prediction \(\hat{y}\) to match the actual outcome \(y\) as best as possible.
We define error as the difference between the prediction and the actual outcome \(y - \hat{y}\).
When the outcome is categorical, we refer to the machine learning task as classification
The main output of the model will be a decision rule which prescribes which of the \(K\) classes we should predict.
\[ \mbox{When } f_k(x_1, x_2, \dots, x_p) > C, \mbox{ predict category } k \]