Lexical Resources

class: center, middle
background-image:url(images/data-background-light.jpg)

# Lexical Resources

## Master TAL, Nancy, 2019-2020

#### Christophe cerisara

.footnote[.bold[cerisara@loria.fr - CNRS / LORIA]]

---

.center[
## TP on Word Embeddings
]

- Mark /5
- I've sent an email with the mark to everyone from which I have received something: if you've not received any email, and you've sent something, please react before friday !

---

.center[
## TP on Word Embeddings
]

- Classic error 1: bad models (overfitting)
  - subwords -> good for OOV, but overfitting is easier
  - "information", "informations"
  - In Embedding space, "rare" words will occur in arbitrary regions

---

.center[
## TP on Word Embeddings
]

- Classic error 2: train on test
  - **NEVER** !!

- Power of transfer learning
  - Good data is more data

---

.center[
## FrameNet
]

- Original theory: C. J. Fillmore, 1976
---

.center[
## FrameNet
]

- Original theory: C. J. Fillmore, 1976
- FrameNet project:
   - Dev by ICSI Berkeley
   - Online Demo:
     - https://framenet.icsi.berkeley.edu/fndrupal/
   - Detailed documentation:
     - https://framenet2.icsi.berkeley.edu/docs/r1.7/book.pdf

---

.center[
## Using FrameNet
]

- Getting started:
```
from nltk.corpus import framenet as fn
```

---

.center[
## FrameNet basic notions
]

- **Frame**: official definition
  - "Script-like conceptual structure that describes a particular type of situation, object, or event along with the participants and props that are needed for that Frame"

```
   fn.frames()
   fn.frame('Motion')
   f=fn.frame(7)
```

---

.center[
## FrameNet basic notions
]

- **Lexical Unit** LU ~= lemma in WordNet: word taken in a specific meaning
  - LU evoke frames, e.g. Commerce_buy evoked by:
    - buy.v
    - buyer.n
    - client.n
    - purchase [act].n
    - purchase.v
    - purchaser.n

---

.center[
## FrameNet basic notions
]

- **Lexical Unit** LU ~= lemma in WordNet: word taken in a specific meaning

```
   fn.lus()
   fn.lu(4896)
```

---

.center[
## FrameNet basic notions
]

- **frame elements** (FE) = roles in a frame

```
   f.FE
```

---

.center[
## FrameNet basic notions
]

- **Relations** between frames:
  - Inheritance
  - Using: The child frame presupposes the parent frame as background, e.g the "Speed" frame "uses" (or presupposes) the "Motion" frame
  - Subframe: The child frame is a subevent of a complex event represented by the parent, e.g. the "Criminal_process" frame has subframes of "Arrest", "Arraignment", "Trial", and "Sentencing".
  - …

```
   f.frameRelations
```

---

.center[
## FrameNet basic notions
]

- **Relations** between frame elements: fn.fe_relations()

<img src="images/relations.png" width="90%"/>

---

.center[
## FrameNet basic notions
]

- Data type
  - FrameNet has chosen to implement averything with a mapping "keys" -> "values"
  - look at type(f.frameRelations[0])
  - look at type(fn.fe_relations()[0])
  - So to know which type of relation you're manipulating:

```
   f.frameRelations[0]._type
   fn.fe_relations()[0]._type
```

---

.center[
## FrameNet corpora
]

- Full-text annotations = part of SemEval-07 shared task 19.
  - Texts from journals: 14k diff words, 4020 sentences with LU
---

.center[
## FrameNet corpora
]

- Full-text annotations = part of SemEval-07 shared task 19.
  - Texts from journals: 14k diff words, 4020 sentences with LU
- Exemplary sentences with partial annotations
   - Crafted texts: 132k diff words, 141k sentences

---

.center[
## FrameNet corpora
]

- list all documents:

```
fn.docs()
```

- list all documents metadata:

```
fn.docs_metadata()
```

- access a specific document:

```
fn.doc(id)
```

---

.center[
## FrameNet corpora
]

- access a sentence in a document, and its annotations:

```
   fn.doc(6).sentence[0]
   fn.doc(6).sentence[0].annotationSet[0]
```

---

.center[
## FrameNet corpora
]

- get all sentences with a frame:

```
fn.exemplars(frame='Motion')
```

---

.center[
## FrameNet corpora
]

- get all sentences with a lexical unit:

```
fn.exemplars('run')
```

---

.center[
## Exercice
]

Find the sentence and its annotations:

<img src="images/waltz.png" width="90%"/>

---

.center[
## Exercice
]

```
a=[x for x in fn.exemplars('waltz') if 'elbowing' in x.text]
```

---

.center[
## Searching
]

- Search in frames:

```
   fn.frames('Motion')
```

- Is it case sensitive ?
- Use regular expressions

---

.center[
## Searching
]

- Search in lexical units:

```
fn.lus('xpress')
```

---

.center[
## Searching
]

- Retrieving a lexical unit by ID:
```
g=fn.lu(5372)
g.ID
g.definition
g.name
```

- Get the frame evoked by a LU:
```
g.frame
g.frame.name
```

---

.center[
## Exercice
]

- How many frames are there in your version of FrameNet ?
- How many lexical units ?

---

.center[
## Exercice
]

- How many frames are there in your version of FrameNet ?

```
len(fn.frames())
```
- How many lexical units ?

```
len(fn.lus())
```

---

.center[
## Getting help
]

- Google search is very good
- When offline:

```
   help(fn)
```

---

.center[
## Exercice
]

List the names of the frames that are evoked by the LU run.v

---

.center[
## Exercice
]

List the names of the frames that are evoked by the LU run.v

```
[x.frame.name for x in fn.lus('run') if x.name=='run.v']
```

---

.center[
## Exercice
]

- Using the help() function, what is the name of the attributes describing the lemma of a LU ?

---

.center[
## Exercice
]

- Find all LUs that share the frame with 'car'
- Print them along with their definition
- Find all frames whose name includes "contain" (first lettre capitalized or not)

---

.center[
## Exercice
]

- Find all LUs that share the frame with 'car'

```
lus = fn.lus('^car.n')[0].frame.lexUnit
lus = [lus[lu] for lu in lus]
```

---

.center[
## Exercice
]

- Print them along with their definition

```
for u in lus: print(lu.name,lu.definition)
```

---

.center[
## Exercice
]

- Find all frames whose name includes "contain" (first lettre capitalized or not)

```
fn.frames('[Cc]ontain')
```

---

.center[
## Exercice
]

- Useful attributes of a frame:
  - name, definition, FE, lexUnit, frameRelations

- What relations does the Giving frame have ?
- Print the FE of Giving
- Print the core FE of Giving

---

.center[
## Exercice
]

- What relations does the Giving frame have ?

```
give = fn.frame(139)
give.frameRelations
```

- Print the FE of Giving

```
print(give.FE)
```

- Print the core FE of Giving

```
for id in give.FE:
  fe=give.FE(id)
  if fe.coreType=='Core': print(fe.name)
```

---

.center[
## FrameNet, WordNet, VerbNet
]

From Shi et al. 2005:

- FrameNet provides a good generalization across predicates using frames and semantic roles
    - FrameNet does not explicitly define selectional restrictions for semantic roles
- VerbNet and defines syntactic-semantic relationsin a more explicit way
- One of the most useful properties of WordNet is its almost complete coverage of English verbs, and the rich information it encodes about semantic relations between verb senses

---

.center[
## ConcepNet
]

- Integrates information from many sources: Wictionary, Dbpedia, Wordnet, etc.
- Online demo:
  - http://conceptnet.io
- Code:
  - https://github.com/commonsense/conceptnet5

---

.center[
## ConcepNet
]

- Network of words joined by relations (>30):
  - RelatedTo (Stick - branch)
  - FormOf (sticks - stick)
  - Location (stick - forest)
  - PartOf (stick-ice hockey)
  - Antonyms (stick - automatic)
  - UsedFor (stick - force)
  - …

---

.center[
## VerbNet
]

- Description of Argument structures of verbs
- Theoretical basis: Levin B. (1993), Univ. of Chicago
- Basic unit = verb class
  - Different verbs in the same class have the same arguments structure

---

.center[
## VerbNet
]

Verb classes

- Each class characterized by:
  - A set of thematic roles (participants to the events)
  - Restrictions on role fillers (for participants)
  - Syntactic frames (how participants can be expressed syntactically)

---

.center[
## VerbNet
]

Example

- Class *Hit-18.1* (bang, bash, hit, kick...)
- Roles and restrictions:

| Paula hit      | the ball    | with a stick |
|----------------|-------------|--------------|
| Agent          | Patient     | Instrument   |
| [+int_control] | [+concrete] | [+concrete]  |

(intentional_control)

---

.center[
## VerbNet
]

Thematic roles

- **Agent**: subject of action
- **Patient**: participant undergoing an action
- **Theme**: Undergoer that is central to an event or state that does not have control over the way the event occurs, is not structurally changed by the event, and/or is characterized as being in a certain position or condition throughout the state.
- **Instrument**
- **Beneficiary**
- ...

(see https://verbs.colorado.edu/verb-index/vn3.3/themroles/ )

---

.center[
## VerbNet
]

Selectional restriction classes

<img src="images/verbnetsels.png" height="50%"/>

---

.center[
## VerbNet
]

VerbNet frames

- In VerbNet, syntactic frames provide mappings between the verb's arguments and their syntactic expression

Example for **collaborate**:

Frame 1:

| **NP V PP.theme** | "They collaborated on the task"            |
|-------------------|--------------------------------------------|
| *syntax*          | Agent <+plural> V {on} Theme <-sentential> |

Frame 2:

| **NP V PP.theme S_ING** | "They collaborated in finishing the task" |
|-------------------------|-------------------------------------------|
| *syntax*                | Agent <+plural> V {in} Theme <+sc_ing>    |

---

.center[
## VerbNet
]

## VerbNet in python

Must use python3 !

```
import nltk
nltk.download('verbnet')
from nltk.corpus import verbnet as vn
```

---

.center[
## VerbNet
]

List all lemmas

```
vn.lemmas()
```

List VerbNet classes for a lemma:

```
vn.classids(lemma='withdraw')
```

Exercise: write a function that returns whether classes of two lemmas are the same, distinct, or overlap

---

.center[
## VerbNet
]

Retrieving a class

```
withdraw = vn.vnclass('withdraw-82-1')
```

Get lemmas for a class

```
vn.lemmas(vnclass=withdraw)
```

Exercise: write a function vn_neighbors(v) that returns a list of lemmas that share verbnet classes of the verb v

---

.center[
## VerbNet
]

Frames of a class

```
vn.frames(withdraw)
```

Exercise: How many syntactic frames are listed in VerbNet ?

---

.center[
## VerbNet
]

Frame attributes

```
wd1 = vn.frames(withdraw)[0]
wd1['description']
wd1['example']
wd1['syntax']
wd1['semantic']
```

---

.center[
## VerbNet
]

Exercise

- How many syntactic frames in VerbNet have a distinct syntax ?

- Extract a list of verbs that can be used in the Basic intransitive syntactic frame (i.e., subject NP + verb). Hint: look at frame descriptions

---

---
name: last-page
class: middle, center, inverse

## That's all folks (for now)!

Slideshow created using [remark](http://github.com/gnab/remark).