Computer vision is a field of study which aims at gaining a deep understanding from digital images or videos. Combined with AI and ML techniques, today many industries are investing in researches and solutions of computer vision.

In my article published on Towards Data Science (if you haven’t read it already, I recommend you to do so before reading the following), I’ve explained the process behind object detection and, in particular, the way the Haar Cascade Algorithm actually works.

Here, I will dive deeper into the concept of Integral Image: it is a data structure and algorithm for generating the sum of values in a rectangular subset of a grid. The goal is reducing the number of computations needed to obtain the summations of pixel intensities within a window.

So let’s see how it works.

The idea is transforming an input images into a summed-area table, where the value at any point (*x*, *y*) in that table is the sum of all the pixels above and to the left of (*x*, *y*), inclusive:

Where *I(x,y)* is the value of the integral image pixel in the position *(x,y)*, while *i(x,y)* is the corresponding intensity in the original image. It is a recursive formula, hence, if we start from one corner of the input image, we will have the same result in the integral image. To make it clearer, let’s see an example:

As you can see, I added one row and column of zeros, since we need one step backward in order to start the recursive formula. Hence, if your image is *w* pixels wide and *h* pixels high, then the integral of this will be *w+1* pixels wide and *h+1*pixels high.

Moving to the computations, let’s start from the first pixel in the original image with intensity 1: the integral image returns exactly the same value, since it is computing (1+0+0). Then, pixel ‘3’ becomes ‘4’, since it is 3+1+0+0. With the same procedure, we obtain an ‘8’ (7+1+0) and a ’20’ (9+3+1+7).

Nice, so we have a new image, but how is it supposed to be useful? The answer rely in an unique property of the integral image. Indeed, it turned our that if you need to compute the summation within a window in the input image, hence that summation is equal to a linear combination of the corresponding window’s corner in the integral image, as follows:

Where A, B, C and D are the corners of the corresponding window in the integral image.

This reduces the number of computations by far. To give you an idea, consider a 100×100 image with a 9×9 window. We want to compute the sum of the pixel intensities within that window, which requires 8 operations. If we repeat this procedure 100 times, we obtain 800 operations.

Now let’s see the integral image approach. First, we compute the summed-area table, which requires 56 operations. Then, considering the same 9×9 window, to compute the sum of pixel intensity we just need the above formula, which is made of 3 operations. Hence, the total number of operations is 56+3*100=356. As you can see, it is less than a half.

This procedure is widely used in computer vision and Haar Cascade algorithm is based exactly on that.

Again, you can find my Haar Cascade publiaction here.