RNN exercices


Basic RNN

  • Download rnnsin.py from https://synalp.loria.fr/rnnsin.py
  • Install pytorch+numpy+matplotlib and run the code
  • Analyze the results, the code and explain in a few lines what it does
  • Modify the number of hidden units in the RNN to 1: try again: is it still learning ?
  • Modify the learning rate to 0.01: is it still learning ?
  • What is the convergence rate ? Justify in a few lines
  • Modify the code to predict the sinusoid at T+10: is it still learning ? What is the convergence rate compared to the previous case (explain in a few lines) ?
  • Optional: modify the code to predict a 2-dimensional sinusoid

Attention

  • Modify rnnsin.py so that:
    • It summarizes all time-dependent hidden vectors with attention and a single learnt query vector, and outputs one of two classes:
class ModelAtt(Model):
  def __init__(self):
      super(ModelAtt, self).__init__()
      qnp = 0.1*np.random.rand(self.hiddensize)
      self.q = nn.Parameter(torch.Tensor(qnp))

  def forward(self, x):
      batch_size = x.size(0)
      hidden = self.init_hidden(batch_size)
      steps, last = self.rnn(x, hidden)
      alpha = torch.matmul(steps,self.q)
      alpha = nn.functional.softmax(alpha,dim=1)
      alpha2 = alpha.unsqueeze(-1).expand_as(steps)
      weighted = torch.mul(steps, alpha2)
      rep = weighted.sum(dim=1)
      out = self.fc(rep)
      return out, alpha
  • Use 10000 epochs, LR=0.0001 and RMSprop optimizer
  • Use the CrossEntropyLoss() instead of the MSELoss() to learn the two classes
  • Use the following data, which perturbs the curve either up (class 0) or down (class 1) at some random position:
def f(x,offset):
    return 0.3*math.sin(0.1*x+offset)+0.5

nex=100
nsteps=50
input_seqs = []
target_seqs = []
for ex in range(nex):
    offset = np.random.rand()
    input_seq=[f(x,offset) for x in range(nsteps)]
    cl = np.random.randint(2)
    target_seqs.append(cl)
    if cl==0: perturb = 0.05
    else: perturb = -0.05
    pos=np.random.randint(25,45)
    for t in range(pos,pos+5): input_seq[t]+=perturb
    input_seqs.append(input_seq)

input_seq = torch.Tensor(input_seqs)
input_seq = input_seq.view(nex,nsteps,1)
target_seq = torch.LongTensor(target_seqs)
  • Make it run to train the classifier model
  • Does it learn to predict the two classes correctly ? Is learning stable ?
  • After training, plot both the input curve and the attention weights, for the first 5 curves: does attention correctly spots the perturbation ?
  • Try without the offset: what happens ? Does attention spots the perturbation ? Explain.
  • Try to find better hyper-parameters so that convergence is faster.
  • Modify the training loop so that random curve generation is generated directly inside the training loop: there is no more any epoch, but only an infinite sequence of random batches: what happens ?
  • Try with longer vs. shorter and smaller/bigger perturbations: in which cases does it work or not ? How sensitive is the approach to perturbations ?

See also