[WIP] Gaussian Process Regression

Notes on Gaussian Process Regression

What is Gaussian process?

A Gaussian process is a collection of indexed random variables, such that any collection of which forms a multivariate normal distribution. Here is an example of a Gaussian process: $$ \{ Y_{1}, Y_{2}, Y_{3}, Y_{4}, Y_{5}, ... \} $$ where the random variables are indexed (by subscript) by a time step $t \in \mathbb{Z}$. Indexing by time like this is common, where we know that there is a sequential order, controlled by the time index. Another example is this: $$ \{ Y_{-1.5}, Y_{-1}, Y_{-0.75}, Y_{-0}, Y_{0.4}, ... \} $$ where the random variables are indexed by a non-temporal variable $x \in \mathbb{R}$. This is really nothing but:

x y
-1.5
-1
-0.75
0
0.4

Note: a collection of random variables indexed by time or space is called a stochastic process.

Why Gaussian process?

There are two main reasons for using Gaussian process.

The fact that random variables from this Gaussian process can form a multivariate normal distribution is an important property. For example, we can gather $Y_{-1}$ and $Y_{0.4}$ and define their distribution: $$ \left(\begin{array}{l} Y_{-1} \\ Y_{0.4} \end{array}\right) \sim N( \mu, \Sigma ) $$ Why is this even useful? Notice that a multivariate Gaussian distribution is governed by two parameters: $\mu$, the mean and, more importantly, $\Sigma$, the covariance. Being able to specify the covariance allows us to define every two points by their correlations!

From the perspective of a person who collects the data, each observation can be thought of as a sample from some multivariate Gaussian distribution.

The second wonderful thing about Gaussian processes is that the derivative distributions like the prior and the posterior distributions are also Gaussian. And so what about Gaussian? Gaussian distributions have closed-form algebraic expressions 🤗!

Kernels

What's left to be done is to now define the covariance for any set of random variables. To define the covariance, we need what we call the covariance function, more commonly known as the kernel. This function takes in a pair of numbers $x_1$ and $x_2$ and returns a number, representing the covariance between the two. The shorthand to represent kernels are usually written as $$ k(x_1, x_2) $$ or $$ k(x, x') $$ or using $t$ instead of $x$ (because processes can also be indexed by time $t$).

to be correlated to one another. Defining this enables us to define one of the two parameters of a multivariate distribution: the mean and the covariance.

A kernel

Other features

Resources

A Visual Exploration of Gaussian Processes