Last time i wrote about how I learned basics of Convolutional Neural Networks with PyTorch. This time we’ll be dealing with Recurrent Neural Networks. The crucial trait of RNNs and their descendants is the ability to handle variable input size. Whereas for CNNs the input image has to be resized to fit expected input size, RNNs, LSTMs, generators and similar are here to help with this issue. I’ve come up with a pretty simple example so that we can go through the whole process and even calculate every tiny step on a sheet of paper if you want!

## The exercise

We’re going to use PyTorch’s Recurrent Neural Network (*nn.RNN*) for the following case:

the values of sine(x) are split evenly between 0 and 2π. The network will be given a sequence of values, one value at a time, with a task to predict several upcoming values.

```
import math
import random
import torch
import torch.nn as nn
def generate_sine_values():
num_points = 36
min_sequence_length = 4
prediction_length = 1
start = random.randint(0, num_points - min_sequence_length - prediction_length)
end = random.randint(start + min_sequence_length + prediction_length, num_points + 1)
Y = []
for i in range(start, end):
x = i * math.pi * 2 / num_points
y = math.sin(x)
Y.append(y)
y = Y.pop()
return torch.tensor(Y).view(-1, 1), torch.tensor(y)
# Set the hyperparameters
input_size = 1
hidden_size = 4
learning_rate = 0.1
num_epochs = 20000
num_to_be_generated = 10
# Create the RNN model
model = nn.RNN(input_size, hidden_size)
# Define the loss function and optimizer
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
# Set model to training mode
model.train()
# Train the RNN model
for epoch in range(num_epochs):
# Reset the hidden state for each sequence
hidden = torch.zeros(1, hidden_size)
# Generate input and target
Y, y = generate_sine_values()
# Predict sequence values
output, _ = model(Y, hidden)
y_pred = output[-1, :][-1]
# Compute the loss
loss = criterion(y_pred, y)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Print the loss every 10 epochs
if (epoch + 1) % 10 == 0:
print(f'Epoch: {epoch + 1}/{num_epochs}, Loss: {loss.item():.4f}')
# Set model to evaluation mode
model.eval()
# Print weights and biases
print('----------------------------------')
print('weight_ih_l0, ', model.weight_ih_l0)
print('bias_ih_l0, ', model.bias_ih_l0)
print('weight_hh_l0, ', model.weight_hh_l0)
print('bias_hh_l0, ', model.bias_hh_l0)
print('----------------------------------')
# Generate input
Y, _ = generate_sine_values()
# Predict some next sequence values
for _ in range(num_to_be_generated):
# Reset the hidden state for each sequence
hidden = torch.zeros(1, hidden_size)
# Predict sequence values
output, _ = model(Y, hidden)
y_pred = output[-1, :][-1]
# Append predicted value as a last known value for the next iteration
Y = torch.cat((Y, y_pred.view(1, 1)))
# Print complete sequence with known and predicted values
for index, y in enumerate(Y):
# Separate known values from predicted ones
if (index == Y.size(0) - num_to_be_generated):
print('----------------------------------')
print(y.item())
```

Running this code produces the following output. Of course the result of running this code on your machine depends on initial conditions, so you can’t expect to have exactly the same numbers.

```
...
Epoch: 19980/20000, Loss: 0.0009
Epoch: 19990/20000, Loss: 0.0000
Epoch: 20000/20000, Loss: 0.0006
----------------------------------
weight_ih_l0, Parameter containing:
tensor([[0.4578],
[0.8740],
[0.7745],
[2.5193]], requires_grad=True)
bias_ih_l0, Parameter containing:
tensor([ 0.2991, -0.1368, 0.5886, 0.2219], requires_grad=True)
weight_hh_l0, Parameter containing:
tensor([[ 0.3623, 0.0076, -0.9166, -0.2516],
[ 0.4999, 0.1557, 0.0980, -0.7547],
[-0.2654, 0.5356, 0.6579, 0.0889],
[ 0.5429, 0.8763, 0.4217, -1.3401]], requires_grad=True)
bias_hh_l0, Parameter containing:
tensor([-0.5376, 0.6217, -0.2166, -0.3979], requires_grad=True)
----------------------------------
0.3420201539993286
0.1736481785774231
1.2246468525851679e-16
-0.1736481785774231
-0.3420201539993286
-0.5
----------------------------------
-0.6631757020950317
-0.7941378951072693
-0.8717836737632751
-0.9104017019271851
-0.9280122518539429
-0.9320352077484131
-0.9236619472503662
-0.9006775617599487
-0.8604729771614075
-0.803618848323822
```

A simple line chart depicts how our RNN has behaved. As you can see, this simple model turned out to be quite accurate. Your results are going to depend highly on length of the input sequence as well as exact point where the input ends. I’ve picked a relatively accurate for this case.

The values behind this chart look like this, where *Input* and *Prediction* were taken from console output.

Baseline |
Input |
Prediction | |

0 |
0 | ||

1 |
0.17364817766693 | ||

2 |
0.342020143325669 | ||

3 |
0.5 | ||

4 |
0.642787609686539 | ||

5 |
0.766044443118978 | ||

6 |
0.866025403784439 | ||

7 |
0.939692620785908 | ||

8 |
0.984807753012208 | ||

9 |
1 | ||

10 |
0.984807753012208 | ||

11 |
0.939692620785908 | ||

12 |
0.866025403784439 | ||

13 |
0.766044443118978 | ||

14 |
0.642787609686539 | ||

15 |
0.5 | ||

16 |
0.342020143325669 |
0.342020153999329 | |

17 |
0.173648177666931 |
0.173648178577423 | |

18 |
1.22464679914735E-16 |
1.22E-16 | |

19 |
-0.17364817766693 |
-0.173648178577423 | |

20 |
-0.342020143325669 |
-0.342020153999329 | |

21 |
-0.5 |
-0.5 |
-0.5 |

22 |
-0.642787609686539 |
-0.663175702095032 | |

23 |
-0.766044443118978 |
-0.794137895107269 | |

24 |
-0.866025403784438 |
-8.72E-01 | |

25 |
-0.939692620785908 |
-0.910401701927185 | |

26 |
-0.984807753012208 |
-0.928012251853943 | |

27 |
-1 |
-0.932035207748413 | |

28 |
-0.984807753012208 |
-0.923661947250366 | |

29 |
-0.939692620785909 |
-0.900677561759949 | |

30 |
-0.866025403784439 |
-0.860472977161408 | |

31 |
-0.766044443118978 |
-0.803618848323822 | |

32 |
-0.64278760968654 | ||

33 |
-0.5 | ||

34 |
-0.342020143325669 | ||

35 |
-0.17364817766693 | ||

36 |
-2.44929359829471E-16 |

Finally, let’s calculate a single step by hand to find out what this model really does behind the scenes. For this purpose I’m going to use Wolfram Alpha. According to PyTorch documentation the RNN cell does the following thing:

h_{t}=tanh(x_{t}W+_{ih}^{T}b+_{ih}h_{t}_{−1}W_{hh}^{T}+b)_{hh}

Below I paste calculations step by step. Note that result of each individual calculation is the hidden state for the next one. As stated in the formula above, only two things change: X_{t} (current sine value) and h_{t-1} (hidden state from previous step). During the first step, the hidden state is empty, thus equals (0, 0, 0, 0). First 5 calculations over already known values are to generate the hidden state (the *memory*) of the RNN cell.

Look, the last value of this tensor, **-0.663155**, matches **-0.663175702095032 **– the first value predicted by our RNN cell! From now on it predicts based on previous predictions, thus the accuracy inevitably deteriorates. But, for the sake of the exercise, I find it really exciting, so let’s go one step further!