Basic RNN
- Download rnnsin.py from https://synalp.loria.fr/rnnsin.py
- Install pytorch+numpy+matplotlib and run the code
- Analyze the results, the code and explain in a few lines what it does
- Modify the number of hidden units in the RNN to 1: try again: is it still learning ?
- Modify the learning rate to 0.01: is it still learning ?
- What is the convergence rate ? Justify in a few lines
- Modify the code to predict the sinusoid at T+10: is it still learning ? What is the convergence rate compared to the previous case (explain in a few lines) ?
- Optional: modify the code to predict a 2-dimensional sinusoid
Attention
- Modify rnnsin.py so that:
- It summarizes all time-dependent hidden vectors with attention and a single learnt query vector, and outputs one of two classes:
class ModelAtt(Model):
def __init__(self):
super(ModelAtt, self).__init__()
qnp = 0.1*np.random.rand(self.hiddensize)
self.q = nn.Parameter(torch.Tensor(qnp))
def forward(self, x):
batch_size = x.size(0)
hidden = self.init_hidden(batch_size)
steps, last = self.rnn(x, hidden)
alpha = torch.matmul(steps,self.q)
alpha = nn.functional.softmax(alpha,dim=1)
alpha2 = alpha.unsqueeze(-1).expand_as(steps)
weighted = torch.mul(steps, alpha2)
rep = weighted.sum(dim=1)
out = self.fc(rep)
return out, alpha
- Use 10000 epochs, LR=0.0001 and RMSprop optimizer
- Use the CrossEntropyLoss() instead of the MSELoss() to learn the two classes
- Use the following data, which perturbs the curve either up (class 0) or down (class 1) at some random position:
def f(x,offset):
return 0.3*math.sin(0.1*x+offset)+0.5
nex=100
nsteps=50
input_seqs = []
target_seqs = []
for ex in range(nex):
offset = np.random.rand()
input_seq=[f(x,offset) for x in range(nsteps)]
cl = np.random.randint(2)
target_seqs.append(cl)
if cl==0: perturb = 0.05
else: perturb = -0.05
pos=np.random.randint(25,45)
for t in range(pos,pos+5): input_seq[t]+=perturb
input_seqs.append(input_seq)
input_seq = torch.Tensor(input_seqs)
input_seq = input_seq.view(nex,nsteps,1)
target_seq = torch.LongTensor(target_seqs)
- Make it run to train the classifier model
- Does it learn to predict the two classes correctly ? Is learning stable ?
- After training, plot both the input curve and the attention weights, for the first 5 curves: does attention correctly spots the perturbation ?
- Try without the offset: what happens ? Does attention spots the perturbation ? Explain.
- Try to find better hyper-parameters so that convergence is faster.
- Modify the training loop so that random curve generation is generated directly inside the training loop: there is no more any epoch, but only an infinite sequence of random batches: what happens ?
- Try with longer vs. shorter and smaller/bigger perturbations: in which cases does it work or not ? How sensitive is the approach to perturbations ?
See also