RNN exercices

Basic RNN

Download rnnsin.py from https://synalp.loria.fr/rnnsin.py
Install pytorch+numpy+matplotlib and run the code
Analyze the results, the code and explain in a few lines what it does
Modify the number of hidden units in the RNN to 1: try again: is it still learning ?
Modify the learning rate to 0.01: is it still learning ?
What is the convergence rate ? Justify in a few lines
Modify the code to predict the sinusoid at T+10: is it still learning ? What is the convergence rate compared to the previous case (explain in a few lines) ?
Optional: modify the code to predict a 2-dimensional sinusoid

Attention

Modify rnnsin.py so that:
- It summarizes all time-dependent hidden vectors with attention and a single learnt query vector, and outputs one of two classes:

class ModelAtt(Model):
  def __init__(self):
      super(ModelAtt, self).__init__()
      qnp = 0.1*np.random.rand(self.hiddensize)
      self.q = nn.Parameter(torch.Tensor(qnp))

  def forward(self, x):
      batch_size = x.size(0)
      hidden = self.init_hidden(batch_size)
      steps, last = self.rnn(x, hidden)
      alpha = torch.matmul(steps,self.q)
      alpha = nn.functional.softmax(alpha,dim=1)
      alpha2 = alpha.unsqueeze(-1).expand_as(steps)
      weighted = torch.mul(steps, alpha2)
      rep = weighted.sum(dim=1)
      out = self.fc(rep)
      return out, alpha

Use 10000 epochs, LR=0.0001 and RMSprop optimizer
Use the CrossEntropyLoss() instead of the MSELoss() to learn the two classes
Use the following data, which perturbs the curve either up (class 0) or down (class 1) at some random position:

def f(x,offset):
    return 0.3*math.sin(0.1*x+offset)+0.5

nex=100
nsteps=50
input_seqs = []
target_seqs = []
for ex in range(nex):
    offset = np.random.rand()
    input_seq=[f(x,offset) for x in range(nsteps)]
    cl = np.random.randint(2)
    target_seqs.append(cl)
    if cl==0: perturb = 0.05
    else: perturb = -0.05
    pos=np.random.randint(25,45)
    for t in range(pos,pos+5): input_seq[t]+=perturb
    input_seqs.append(input_seq)

input_seq = torch.Tensor(input_seqs)
input_seq = input_seq.view(nex,nsteps,1)
target_seq = torch.LongTensor(target_seqs)

Make it run to train the classifier model
Does it learn to predict the two classes correctly ? Is learning stable ?
After training, plot both the input curve and the attention weights, for the first 5 curves: does attention correctly spots the perturbation ?
Try without the offset: what happens ? Does attention spots the perturbation ? Explain.
Try to find better hyper-parameters so that convergence is faster.
Modify the training loop so that random curve generation is generated directly inside the training loop: there is no more any epoch, but only an infinite sequence of random batches: what happens ?
Try with longer vs. shorter and smaller/bigger perturbations: in which cases does it work or not ? How sensitive is the approach to perturbations ?

Basic RNN

Attention

See also