Non-parametric models

Bayesian non-parametric models are models for regression which learn a prior distribution over functions at training time. This is achieved by learning so-called ‘Kernel hyper-parameters’ from the training data set, which specify a Gaussian Process Prior. Inference is performed at test time by applying Bayes’ law to this prior with the available data [79]. The process of performing computation inference at test time is referred to as lazy learning (as opposed to eager learning presented in the previous section, where the inference happens during training, and only a single pass through the network is required to make predictions). Gaussian Process models can be tuned to have desirable properties for many applications, for example one can assume that the training data is noise-free and hence the relevant function can be learned from very few samples. Gaussian Processes are particularly useful for global optimisation of non-linear functions, because the predictive uncertainty can be used to decide where the next sample should be chosen [80]. Gaussian processes are not used in this thesis (except for as a comparison in some of the numerical examples), but we will briefly describe how they relate to the Bayesian parametric models discussed in this Chapter.

It can be shown that a single layer Bayesian neural network, with an infinite number of neurons in the layer, is equivalent to a Gaussian Process, and can therefore be used as a more convenient alternative, since the inference is performed at training time [81]. Variational approximations can be made for Gaussian Processes to make the inference computationally tractable for large data sets and deep architectures [82]. Neural Processes learn a distribution over functions in a similar way to Gaussian processes, but with significantly reduced computational expense since only a forward pass through the neural network is required at test time [83]. Neural Processes are useful for meta-learning (learning to learn). For example, they were used to learn how to predict 2D views of 3D spaces, given limited training data [84].