Why Padding is important in Convolutional Neural Network (CNN)

Last Modified : Wednesday, May 01, 2024

While solving most of the computer vision problems one of the important tasks is to collect images (dataset). Each image will have a different shape but to train our model on those images we need all of the images in one shape.

Yes, you are right we can reshape all of our images into one shape, we can do that, but there is a problem, let’s understand that problem.

google image

Suppose you are building a cat or dog classifier, you downloaded images from google, and while observing those images you noticed that some images are in a rectangular or square shape so you decided to reshape all of your images into a square shape.

Now remember that you are reshaping the rectangle images into square-shaped images so rectangular images are going to be condensed, maybe the face of the dog or cats’ leg will become bigger than usual, maybe their body will become thin, and many more things can happen.

google image

There is a solution to this problem…

Padding

In laymen, language padding is adding zeros to the image matrix while it is being processed.

Let’s understand it Briefly,

Note: Now I am going to use reference of Kernels, if you don’t know about them then you can read my previous blog or for lazy people like me: kernel is a matrix that moves over the input data

When 3 x 3 kernel moves over 6 x 6 size image, it gives us 4 x 4 size output matrix.

The below image shows that how many times 3 x 3 kernel moved over Sigal pixel(element). The colored section in the below image tells us that kernel moved more times in the middle part of the image than the edge part. So output matrix will have more knowledge about the middle part of the image than edges and we might lose some important features from edges.

no of times movement of kernel over image

A solution to this problem is to add one more dimension, which means a 6 x 6 sized image will become a 7 x 7 size.

hjj

Now again questions arrive, What values to add to this dimension?

The input matrix is an image so it has values from 0 to 255 (color range), but if we add random values from this range then it will be part of the image and the kernel will extract that unnecessary information. So we will add 0 or 255 values (0 is for black color and 255 is for white color).

Note: Input matrix is 7 x 7 size and kernel is 3 x 3 size so output matrix will have 6 x 6 size that means output matrix will have most of the features from the image.

If you apply padding when you are reshaping your images then there is an extra dimension over each image so this will help to reshape the images properly.

Okgot it, add extra dimension; but how to decide how many dimensions to add?

To decide that there is a formula

padding_dimesion =( kernel_dimesion-1) / 2