# Convolutional Neural Networks – Hello World

With all the hustle and bustle with GPT-3, Chat GPT and recent results in image recognition, I realized that it’s about time to learn something in this realm not to fall behind that much. Hence, this time I’m going to present the most basic Convolutional Neural Network I could imagine that still exhibits some interesting traits and can be a reasonable ‘Hello World’ for everyone that would like to learn the basics.

## This is what is going to happen

• We will lay out the problem
• Then we’re going to present and analyze PyTorch-based CNN code that solves this problem
• Finally, you’ll have a chance to solve this problem on a sheet of paper to understand what our Convolutional Neural Network actually does

Disclaimer: I am no expert in the realm of machine learning nor neural networks. This is just me having some fun with this incredibly interesting topic, follow me if you want.

Disclaimer 2: I will be using term ‘neural network’ whereas we actually train a single layer, this might be misleading, but I want you to be focused on the process and the fun, not the names.

## The case

Our neural network will be dealing with image recognition on 3×3 pixel-sized image. We’re going to train this network to recognize two types of crosses, namely:

## The code

I will print here the whole code as a single block with comments to make it clear that I don’t have anything besides this code – nothing is hidden and nothing else is required to run this network – all you have to do is paste the following code as a python script, have PyTorch installed and run it.

Pay attention to the evaluation part, where the input image will be purposefully missing one piece to verify network performance.

``````import random
import torch

def generate_type_0_image():
[1, 1, 1],
[0, 1, 0]]], dtype=torch.float32)

def generate_type_1_image():
[0, 1, 0],
[1, 0, 1]]], dtype=torch.float32)

def generate_image():
# decide randomly which image to generate
image_type = random.randint(0, 1)

if (image_type == 0):
image = generate_type_0_image()
else:
image = generate_type_1_image()

# return tuple of image and its type
# so that the network can learn if it recognized it correctly
return image, image_type

class NeuralNetwork(torch.nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
# define single convolutional layer to 'scan' the image
self.conv = torch.nn.Conv2d(1, 1, kernel_size=3, padding=1)
# define single pooling layer that 'wraps up'
# what convolutional layer has learned
self.pool = torch.nn.AvgPool2d(3)

# this is the place where image is processed
# returned 'x' is a single floating point number
# representing image type (0.0 or 1.0)
# the network is going to learn to predict it
def forward(self, x):
x = self.conv(x)
x = self.pool(x)
return x

# model is intialized
model = NeuralNetwork().to('cpu')
# we pick some 'default' loss function
loss_fn = torch.nn.MSELoss()
# we pick some popular optimizer
# model is set in training mode
model.train()

# interestingly, I need 5000 images before the network
# is reliable in this simple task
for i in range(5000):
# each time a random 'image' is generated
image, image_type = generate_image()
# wrap image type as tensor of proper shape
# (I need these additional dimensions to match
# CNN output so that I can calculate loss)
expected_output = torch.tensor([[[image_type]]], dtype=torch.float32)
# our model predicts if 'image' is of type '0' or '1'
prediction = model(image)
# loss function calculates how much inaccurate our model was
loss = loss_fn(prediction, expected_output)
# I print loss to observe how it goes down during learning process
print('Loss at', i, '=', loss)
# this kind of 'resets' optimizer before next iteration
# we tell our network to adjust depending on how bad it went this time
loss.backward()
# we take another step in the learning process
optimizer.step()

# I assume the model is ready and switch it to evaluation mode
model.eval()

# now it's the fun part
# let's create a tensor that resembles type '0' image but misses something
almost_type_0_image = torch.tensor([[[0, 1, 0],
[1, 1, 0],
[0, 1, 0]]], dtype=torch.float32)
# we'd expect the network to predict that this is '0' type image
# (actually, ideally it should print 0.0)
# so let's find out how good is it
print('Almost type 0 image is predicted to be', model(almost_type_0_image))

# let's do a similar thing to type '1' image
# now, ideally it should print 1.0
almost_type_1_image = torch.tensor([[[1, 0, 0],
[0, 1, 0],
[1, 0, 1]]], dtype=torch.float32)
print('Almost type 1 image is predicted to be', model(almost_type_1_image))

# finally, let's print what our network has actually learned
# or, in other words, what it really is
# we're going to need it in the next paragraph
print('Convolutional layer weight matrix is', model.conv.weight)
print('Convolutional layer bias is', model.conv.bias)``````

After running this code, your console should print something like:

``````Loss at 1 = tensor(0.1026, grad_fn=<MseLossBackward0>)
Loss at 2 = tensor(1.5652, grad_fn=<MseLossBackward0>)
Loss at 3 = tensor(1.5187, grad_fn=<MseLossBackward0>)
...
Loss at 4997 = tensor(8.7499e-06, grad_fn=<MseLossBackward0>)
Loss at 4998 = tensor(1.5388e-05, grad_fn=<MseLossBackward0>)
Loss at 4999 = tensor(2.6441e-05, grad_fn=<MseLossBackward0>)
Almost type 0 image is predicted to be tensor([[[0.3761]]], grad_fn=<AvgPool2DBackward0>)
Almost type 1 image is predicted to be tensor([[[1.1388]]], grad_fn=<AvgPool2DBackward0>)
Convolutional layer weight matrix is Parameter containing:
tensor([[[[-1.5447, -0.6216, -1.7912],
[-0.8678,  1.9076, -0.7804],
Convolutional layer bias is Parameter containing:

## Interpretation

Based on approximately 2500 type-0 and 2500 type-1 images we trained our simple network to distinguish both situations to the extent that even if the input image is not perfect, i.e., is missing something, the prediction is still correct:

• For ‘almost type 0 image’ the result is 0.3761, which is way closer to 0.0 than to 1.0
• For ‘almost type 1 image’ the result is 1.1388, which is way closer to 1.0 than to 0.0
``Yeah, I know I could have used additional 'Sigmoid' layer to spread results between 0 and 1, but I wanted to cut off event that to simplify the example. Especially the following paragraph.``