The class consists of 8 videos of about 4 hours in total. It is divided into two main parts. The first part introduces the basics of Neural NLG. The second part focuses on three key topics in NLG namely, Content Selection; Modeling the structure of the input and Transfer Learning. Slides and Videos are available online.
This first video introduces various inputs and communicative goals of Natural Language Generation (NLG), mentions some NLG applications and outlines the structure of the class. The first part is an introduction to the basics of Neural NLG, the second part focuses on three key topics in NLG: content selection, modeling the structure of the input and transfer learning.
This second video introduces the encoder-decoder framework using Reccurrent Neural Networks. It shows how the input can be encoded into a continuous representation and how the decoder generates a text one word at a time.
In this video, we look at three main mechanisms that have been proposed to improve decoding: attention, copy and coverage. The video also covers other means of handling rare or unknown input tokens such as delexicalisation and Bype Pair Encoding.
We now look at alternative neural networks which have been used to encode the input to NLG namely, improved RNNs (LSTM, bi-LSTM and GRU), Convolutional Neural Networks (CNN) and Transformers.
Evaluating generated text is difficult because there are many ways of saying the same thing. We will have a look at some standard automated metrics used in NLG (BLEU, ROUGE), discuss some shortcomings of these metrics and present some alternative metrics which have recently been proposed to address these namely BERTScore, BLEURT and PARENT.
In several NLG applications (e.g., summarisation, summarisation), not all input is verbalised and the NLG model must learn to select those parts of the input which should be mapped to Natural Language. We will look at various methods which have been proposed to select content for extractive summarisation, abstractive summarisation and sentence compression.
The input to NLG is often either a text or a graph. We will look at how hierarchical and ensemble models have been used to model the structure of a text and at adapatations of LSTMs, CNNs and Transformers which have been proposed to better model graph structured input.
Labelled data, in the case of NLG, parallel Input/Output data is difficult to obtain. We will look at how transfer learning can help adress this issue. We will start with a short introduction to feature based vs pretraining and fine-tuning approaches, briefly review the history of transfer learning looking at models such as ELMo, ULMfit, GPT and BERT. We will then focus on how pretraining and finetuning can be used in NLG e.g., by using pre-trained encoder-decoders (BART, T5), using cross lingual embeddings and pretrained language models to generate into multiple languages, using models pretrained on dialogs to either generate or retrieve a dialog turn (dialoGPT, CONVERT) or using models which integrate language modelling with information retrieval.