Differential calculus is a powerful tool to find the optimal solution to a given task. When I say ‘optimal solution’, I’m referring to the result of the optimization of a given function, called objective function. This result might be either a maximum (namely, if your objective function describes your revenues) or a minimum (namely, if your objective function represents your costs).
Optimization procedures are fundamental in data science: every Machine Learning algorithm is optimized through some optimization criteria, so that it will return the best possible result. Namely, if you think about Neural Networks, you can see that the final output (that is, the set of parameters of the algorithm) is returned after the Back-propagation phase, whose aim is optimizing the coefficients with a technique called Gradient Descent (you can read more about this technique here).
Here, I’m going to dwell on the math behind optimization in a multivariate environment. Since multivariate differential calculus involves many steps and concepts, I decided to split into two parts this topic: the first one will dwell on some introductory concepts, while in the second one I’ll focus on the optimization procedure itself.
By the way, before starting with the first part, if you are not familiar with vectors and planes in higher dimensional spaces, I suggest to read my previous article, where I provide a visual and intuitive explanation about that.
Directional Derivative and Tangent Plane
As anticipated, we are going to examine multivariate functions. In particular, we will see those functions from R^2 to R:
Which are represented, in their domain, as surfaces like the following:
The purpose of this first part is finding the tangent plane to the surface at a given point p0. This is the first step to inquire about the smoothness or regularity or continuity of that surface (which is necessary for differentiability, hence the possibility of optimization procedures). To do so, we will cover the following concepts:
- Directional derivative
- Tangent vectors
- Tangent plane
So let’s start.
The directional derivative in a point p0, conceptually, is the rate at which the function changes at that point in the direction v and it is expressed like so:
To define it, let’s first recall the definition of derivative in 1-dimension:
It is defined as the limit of the different quotient as the increment tends to zero. The result is the slope of the tangent to the curve in the point where the derivative is calculated.
Conceptually, this does not change in our multivariate environment. Here, the idea is computing the different quotient considering one point p0 on the surface and its increment, let’s say pt, again, on the surface.
Let’s first visualize those two points:
So we want to compute:
As you can see, we need three ingredients: the values of the function in those two points, and the increment between them.
For this purpose, we can limit the function of our surface, f(x,y), to the projection on that surface of the straight line crossing both p0 and pt.
Now let’s focus on the green line and let’s examine it on a 2-axis space:
Specifically, this is a straight line crossing p0 with direction v (which is the same direction of our Directional Derivative). We can easily compute the parametric expression of this straight line and obtain the coordinates of pt.
Furthermore, we can also compute the last ingredient of our different quotient, which is the increment. We can compute it as the distance (with an adjustment) between p0 and pt:
Hence our final expression of the Directional Derivative will be:
Which results to be a limit only in t rather than multiple variables.
Now let’s consider a specific direction for our directional derivative. I’m talking about the vector v which is parallel to our x-axis, hence its second component will be 0. For the sake of simplicity, let’s consider the first component equal to 1 (it could be any value). In that case, our directional derivative becomes:
Which is the definition of partial derivative of f(x,y) with respect to x. The same reasoning holds if we pick one directional vector which is parallel to the y-axis, namely v=(0,1):
If we collect those two partial derivatives into one vector, we obtain the so-called Gradient of f(x,y). We can evaluate our gradient in any point of the natural domain of the function. However, every resulting vector will have the peculiarity of being orthogonal to the level-lines of our function.
Now we have almost all the ingredients to find the equation of the tangent plane. To do so, we will restrict once more our function to the projection of the two coordinates lines, then find the two tangent vectors of those curves, compute the direction d of the plane containing those two tangent vectors and, finally, compute the equation of the plane with direction d.
The point where the two coordinates lines intersect is p0=(x0,y0):
Let’s proceed step by step, as in the picture above:
- Step 1: we already have the first two components of our coordinates (x free and y fixed for x-coordinate, y free and x fixed for y-coordinate). However, since we want their projection on the surface, we also need a third component, which is nothing but the function evaluated on the coordinates vector:
- Step 2: since the previous expressions are nothing but the equations of two curves, we can take the derivative of each component with respect to the parameter t to compute the tangent vectors:
Note that the third components are the partial derivative of, respectively, x and y (indeed, in the first system x=t, hence differentiating w.r.t. t is equal to differentiating w.r.t. x. The same holds for y in the second system).
- Step 3: we need to compute the direction d of our tangent plane. Since we know that d has to be orthogonal to all the point contained into the plane (you can read more about orthogonality here), and since that plane contains the two tangent vectors computed above, we can derive that our d has to be orthogonal to both the tangent vectors, hence:
Where gamma is a parameter. Since we need only one vector as a direction, we can easily set gamma=1, for the sake of simplicity, hence:
- Step 4: now it’s time to compute the equation of our plane, which can easily derived with the following orthogonality condition:
If we examine this last expression, we can notice that this is the 2-dimensional extension of the first order Taylor polynomial for the approximation of a curve:
The existence of a tangent plane of a surface at a given point is a fundamental element if you want to inquire about the smoothness of that surface. Indeed, you can think about the smoothness at p0 as the possibility of reaching p0 via any direction, not only those corresponding to the partial derivatives (whose existence is not a sufficient condition for regularity of the surface).
The next step towards our optimization task will be checking the differentiability of our surface, which is the assumption behind any optimization problem. So stay tuned for Part 2!