Understanding Input Output shapes in Convolution Neural Network  Keras
Aug 31
·4min read
Even if we understand the Convolution Neural Network
theoretically, quite of us still get confused about its input
and output
shapes while fitting the data to the network. This guide will help you understand the Input and Output shapes for the Convolution Neural Network.
Let’s see how the input shape looks like. The input data to CNN
will look like the following picture. We are assuming that our data is a collection of images.
Input Shape
You always have to give a 4D array as input to the CNN. So input data has a shape of (batch_size, height, width, depth)
, where the first dimension represents the batch size
of the image and other three dimensions represent dimensions of the image which are height, width and depth. For some of you who are wondering what is the depth of the image, it’s nothing but the number of colour channel. For example, RGB
image would have a depth of 3 and the greyscale
image would have a depth of 1.
Output Shape
The output of the CNN is also a 4D array. Where batch size
would be the same as input batch size but the other 3 dimensions of the image might change depending upon the values of filter ,
kernel size and padding we use.
Let’s look at the following code snippet.
Don’t get tricked by input_shape
argument here. Thought it looks like out input shape is 3D, but you have to pass a 4D array at the time of fitting the data which should be like (batch_size, 10, 10, 3)
. Since there is no batch size value in the input_shape
argument, we could go with any batch size while fitting the data.
As you can notice the output
shape is (None, 10, 10, 64)
. The first dimension represents the batch size, which is None
at the moment. Because the network does not know the batch size in advance. Once you fit the data, None
would be replaced by the batch size you give while fitting the data.
Let’s look at another code snippet.
Here I have replaced input_shape
argument with batch_input_shape
. As the name suggests, this argument will ask you the batch
size in advance, and you can not provide any other batch size at the time of fitting the data. For example, you have to fit the data in the batch of 16 to the network only.
Now you can see that output shape also has a batch size of 16 instead of None
.
Attaching a Dense layer on Convolution layer
We can simply add a convolution layer at the top of another convolution layer since the output dimension of convolution is the same as it’s input dimension.
We usually add the Dense
layers at the top of the Convolution
layer to classify the images. However input data to the dense layer 2D array of shape (batch_size, units)
. And the output of the convolution layer is a 4D array. Thus we have to change the dimension of output received from convolution layer to a 2D array.
We can do it by inserting a Flatten
layer on top of the Convolution
layer. Flatten layer squash the 3 dimensions of an image to a single dimension. Now we only have a 2D array of shape (batch_size, squashed_size)
, which is acceptable for dense layers.
Summary

You always have to feed a 4D array of shape (batch_size, height, width, depth)
to the CNN
. 
Output data from CNN
is also a 4D array of shape (batch_size, height, width, depth). 
To add a Dense
layer on top of CNN
layer, we have to change the 4D output of CNN to 2D using keras Flatten
layer.
Up next
Understanding Input and Output shapes in LSTM  Keras
When I started working with the LSTM networks, I was quite confused about the Input and Output shape. This article will…
medium.com
CoLaBug.com遵循[CC BYSA 4.0]分享并保持客观立场，本站不承担此类作品侵权行为的直接责任及连带责任。您有版权、意见、投诉等问题，请通过[eMail]联系我们处理，如需商业授权请联系原作者/原网站。