Categories
AI

Recurrent Neural Networks – Hello World

Last time i wrote about how I learned basics of Convolutional Neural Networks with PyTorch. This time we’ll be dealing with Recurrent Neural Networks. The crucial trait of RNNs and their descendants is the ability to handle variable input size. Whereas for CNNs the input image has to be resized to fit expected input size, RNNs, LSTMs, generators and similar are here to help with this issue. I’ve come up with a pretty simple example so that we can go through the whole process and even calculate every tiny step on a sheet of paper if you want!

The exercise

We’re going to use PyTorch’s Recurrent Neural Network (nn.RNN) for the following case:

the values of sine(x) are split evenly between 0 and 2π. The network will be given a sequence of values, one value at a time, with a task to predict several upcoming values.

import math
import random
import torch
import torch.nn as nn

def generate_sine_values():
    num_points = 36
    min_sequence_length = 4
    prediction_length = 1
    start = random.randint(0, num_points - min_sequence_length - prediction_length)
    end = random.randint(start + min_sequence_length + prediction_length, num_points + 1)
    Y = []

    for i in range(start, end):
        x = i * math.pi * 2 / num_points
        y = math.sin(x)
        Y.append(y)

    y = Y.pop()
    
    return torch.tensor(Y).view(-1, 1), torch.tensor(y)

# Set the hyperparameters
input_size = 1
hidden_size = 4
learning_rate = 0.1
num_epochs = 20000
num_to_be_generated = 10

# Create the RNN model
model = nn.RNN(input_size, hidden_size)

# Define the loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

# Set model to training mode
model.train()

# Train the RNN model
for epoch in range(num_epochs):
    # Reset the hidden state for each sequence
    hidden = torch.zeros(1, hidden_size)

    # Generate input and target
    Y, y = generate_sine_values()

    # Predict sequence values
    output, _ = model(Y, hidden)
    y_pred = output[-1, :][-1]

    # Compute the loss
    loss = criterion(y_pred, y)

    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Print the loss every 10 epochs
    if (epoch + 1) % 10 == 0:
        print(f'Epoch: {epoch + 1}/{num_epochs}, Loss: {loss.item():.4f}')

# Set model to evaluation mode
model.eval()

# Print weights and biases
print('----------------------------------')
print('weight_ih_l0, ', model.weight_ih_l0)
print('bias_ih_l0, ', model.bias_ih_l0)
print('weight_hh_l0, ', model.weight_hh_l0)
print('bias_hh_l0, ', model.bias_hh_l0)
print('----------------------------------')

# Generate input
Y, _ = generate_sine_values()

# Predict some next sequence values
for _ in range(num_to_be_generated):
    # Reset the hidden state for each sequence
    hidden = torch.zeros(1, hidden_size)

    # Predict sequence values
    output, _ = model(Y, hidden)
    y_pred = output[-1, :][-1]

    # Append predicted value as a last known value for the next iteration
    Y = torch.cat((Y, y_pred.view(1, 1)))

# Print complete sequence with known and predicted values
for index, y in enumerate(Y):
    # Separate known values from predicted ones
    if (index == Y.size(0) - num_to_be_generated):
        print('----------------------------------')
    print(y.item())

Running this code produces the following output. Of course the result of running this code on your machine depends on initial conditions, so you can’t expect to have exactly the same numbers.

...
Epoch: 19980/20000, Loss: 0.0009
Epoch: 19990/20000, Loss: 0.0000
Epoch: 20000/20000, Loss: 0.0006
----------------------------------
weight_ih_l0,  Parameter containing:
tensor([[0.4578],
        [0.8740],
        [0.7745],
        [2.5193]], requires_grad=True)
bias_ih_l0,  Parameter containing:
tensor([ 0.2991, -0.1368,  0.5886,  0.2219], requires_grad=True)
weight_hh_l0,  Parameter containing:
tensor([[ 0.3623,  0.0076, -0.9166, -0.2516],
        [ 0.4999,  0.1557,  0.0980, -0.7547],
        [-0.2654,  0.5356,  0.6579,  0.0889],
        [ 0.5429,  0.8763,  0.4217, -1.3401]], requires_grad=True)
bias_hh_l0,  Parameter containing:
tensor([-0.5376,  0.6217, -0.2166, -0.3979], requires_grad=True)
----------------------------------
0.3420201539993286
0.1736481785774231
1.2246468525851679e-16
-0.1736481785774231
-0.3420201539993286
-0.5
----------------------------------
-0.6631757020950317
-0.7941378951072693
-0.8717836737632751
-0.9104017019271851
-0.9280122518539429
-0.9320352077484131
-0.9236619472503662
-0.9006775617599487
-0.8604729771614075
-0.803618848323822

A simple line chart depicts how our RNN has behaved. As you can see, this simple model turned out to be quite accurate. Your results are going to depend highly on length of the input sequence as well as exact point where the input ends. I’ve picked a relatively accurate for this case.

The values behind this chart look like this, where Input and Prediction were taken from console output.

 

Baseline

Input

Prediction

0

0

  

1

0.17364817766693

  

2

0.342020143325669

  

3

0.5

  

4

0.642787609686539

  

5

0.766044443118978

  

6

0.866025403784439

  

7

0.939692620785908

  

8

0.984807753012208

  

9

1

  

10

0.984807753012208

  

11

0.939692620785908

  

12

0.866025403784439

  

13

0.766044443118978

  

14

0.642787609686539

  

15

0.5

  

16

0.342020143325669

0.342020153999329

 

17

0.173648177666931

0.173648178577423

 

18

1.22464679914735E-16

1.22E-16

 

19

-0.17364817766693

-0.173648178577423

 

20

-0.342020143325669

-0.342020153999329

 

21

-0.5

-0.5

-0.5

22

-0.642787609686539

 

-0.663175702095032

23

-0.766044443118978

 

-0.794137895107269

24

-0.866025403784438

 

-8.72E-01

25

-0.939692620785908

 

-0.910401701927185

26

-0.984807753012208

 

-0.928012251853943

27

-1

 

-0.932035207748413

28

-0.984807753012208

 

-0.923661947250366

29

-0.939692620785909

 

-0.900677561759949

30

-0.866025403784439

 

-0.860472977161408

31

-0.766044443118978

 

-0.803618848323822

32

-0.64278760968654

  

33

-0.5

  

34

-0.342020143325669

  

35

-0.17364817766693

  

36

-2.44929359829471E-16

  

Finally, let’s calculate a single step by hand to find out what this model really does behind the scenes. For this purpose I’m going to use Wolfram Alpha. According to PyTorch documentation the RNN cell does the following thing:

ht=tanh(xtWihT​+bih​+ht−1WhhT+bhh​)

Below I paste calculations step by step. Note that result of each individual calculation is the hidden state for the next one. As stated in the formula above, only two things change: Xt (current sine value) and ht-1 (hidden state from previous step). During the first step, the hidden state is empty, thus equals (0, 0, 0, 0). First 5 calculations over already known values are to generate the hidden state (the memory) of the RNN cell.

Look, the last value of this tensor, -0.663155, matches -0.663175702095032 – the first value predicted by our RNN cell! From now on it predicts based on previous predictions, thus the accuracy inevitably deteriorates. But, for the sake of the exercise, I find it really exciting, so let’s go one step further!

Leave a Reply

Your email address will not be published. Required fields are marked *