In a classical design of experiments (DoE) you usually choose a set of points according to some rule and perform experiments to be able to, for example, create a response surface. But when the properties of the process you are trying to describe is difficult to understand and can be destroyed if wrong parameters are applied we have to try something different.
One solution could be to build a predictive model each time a new sample has been taken and decide where to take the next sample given information taken from the updated model. I am going to show you how Gaussian Processes (see the introduction) can be used to collect samples efficiently. In short, the algorithm teaches itself how the process works by asking the correct questions based on what is known, slowly expanding its knowledge safely.
The properties of the Gaussian Process relies on the chosen kernel. In this example, the squared exponential is used which for looks like:
This kernel is used to control the curvature of the estimated function.
The formula for estimating the conditional distribution of the Gaussian Process gives us an expression the covariance:
What is nice about this formula is that it is not dependent on any measurements. Given a kernel and a set of hyperparameters you only need to decide where you want to measure to understand what uncertainty you should expect when predicting the function. This fact makes it possible to design a space-filling experiment design for a given assumption of the properties of the model.
Now recall when we have some measurements we can generate a model such as:
The gray area shows one standard deviation. When the standard deviation is small, we can make good predictions about the function while higher standard deviation indicates that we lack information. Given the four samples, we should be tempted to measure where the standard deviation is high. Just looking at the standard deviation as a function clarifies this thought:
If we have defined a limited domain on the horizontal axis, it should be straightforward to choose the point with the highest standard deviation. This is ok as long as the process cannot be destroyed for a set of parameters. Assume that we do not know exactly for which parameters we reach safety limits, then we need to expand slowly from the measurements we are aware are safe. One way of doing this is to use the squared exponential kernel to include an allowed action radius. Drawing some kernels around the measurements looks like:
And if we take the maximum of these four functions we get:
Notice that the maximum is small between the two points on the left while the kernels are smeared together on the right since they are closer together. This function can be used to describe how safe it is to measure at a given set of parameters.
We can now combine the kernels with the standard deviation by taking the product ending up with:
Now we are encouraged to measure in the vicinity of each data point, but not too close and not too far away. Since the standard deviation is lower when points are closer to each other exploration is often prioritized before refining.
We are going to try to generate a model of the function on the interval constrained by as seen here:
We need to have some knowledge about the process to be able to give the process one or several safe points to start from. We are going to start with $x_0 = 0$ and the goal is to obtain a sequence of $p_i = (x_i, f(x_i))$ for which we can predict the function with good precision.
To find a new candidate we need to have a set of candidates to choose from. The set of candidates are generated using a space-filling random algorithm, in our case the Sobol sequence.
Here is a sequence of 21 samples taken using the method described above.
Notice how the algorithm is cautious to start with and then starts expanding to the right and left, occasionally going back to refine the model instead of exploring. It also does not violate the condition .
Progressive sampling is useful when the process you want to describe is nonlinear and when you need to avoid breaking any constraints. The method scales well to many dimensions and can be automated in actual physical testing environments. We can also handle noisy measurements which would result in slower propagation since the uncertainty of predictions would be larger.
We could add additional constraints which are tailored to the problem at hand, for example scaling the width of the kernel depending on the estimated magnitude of the gradient for each measurement or adding other functions which control how samples are chosen.