In my previous article, I introduced some concepts which are necessary if we want to set an optimization problem in a multivariate environment.
Here, we will first dwell on how to check the smoothness of a surface (which is the main assumption to deploy an optimization task), then we will see how to look for local/global maximum/minimum in a multivariate environment.
Smoothness and differentiability
In a very intuitive way, we can say that a surface is smooth if there are neither holes nor corners nor jumps. It is the same idea of continuity in 1-dimensional space, but extended to a multivariate environment.
More specifically, stating that a surface is differentiable in a point p0 means that we can reach that point in any ways we want, not only via the partial derivatives. That means that there exists a plane, tangent to p0, which can approximate the surface in a neighborhood (better: any neighborhood ) of p0.
Let’s visualize this concept with the following surface:
We have that the value of the point p0 on the surface and on the tangent plane is, by definition, the same. On the other hand, picking a point close to p0, let’s say p, we can see that the value of p on the surface is different from that on the plane: we will call this difference error. Hence, we can represent the approximation of the surface on a generic point p as follows:
The idea is that the plane is a good approximation of the surface if the error is negligible with respect to the distance between p0 and p. In other words, we want the error to be a small o of the distance, hence:
Nice, now we can move towards the optimization procedure.
As anticipated, from now on we will consider differentiability as given: this will be our strong assumption. With that being said, let’s first visualize the kind of problem we want to solve:
The idea is finding those values of maximum and minimum of our surface, and we can do that starting with a fundamental theorem, the Fermat Theorem:
“if f(x,y) is differentiable in its natural domain, then every optimal point has a horizontal tangent plane”.
So if we look at the equation of a tangent plane:
It means that the red part must be equal to zero, in order for that plane to be horizontal. It means that the Gradient of the function, whose components are the partial derivative, must be equal to the 0-vector:
So, by setting the gradient equal to zero, we are picking all the candidates for optimal points. However, how can we classify them? For this purpose, we need the second order Taylor polynomial, and then study the sign of the red quantity:
If it is positive, it means that the function of the surface takes values which are greater than p0 everywhere around it, hence p0 is a local minimum. On the other side, if the quantity is negative, it means that the function takes values which are lower than p0 everywhere around it, hence p0 is a local maximum.
We can easily study the sign of that quantity as it is a quadratic form with representative matrix:
Which is called Hessian matrix. So, we can say that if H evaluated at p0 is positive definite, p0 is a local minimum; if H is negative definite, p0 is a local maximum; if it is indefinite, p0 is not an optimal point. Everything holds provided that H is a no-singular matrix (hence its determinant must be different from 0).
Optimization is a fundamental concept in data science and there are many different techniques one could employ. However, the idea behind is always the same: picking an objective function (in case of Machine Learning algorithms, it is represented by the loss function, hence optimization means minimization) and set the problem as in a multivariate environment.
Of course, pre-built algorithm will do all the computations for you, nevertheless it is important to get the intuition behind, since even self-learning algorithms need to be tuned (or at least initialized).