class: center, middle background-image:url(images/data-background-light.jpg) # Lexical Resources ## Master TAL, Nancy, 2019-2020 #### Christophe cerisara .footnote[.bold[cerisara@loria.fr - CNRS / LORIA]] --- .center[ ## Semantics representations ] - **Semantic parsing** = converting sentence into machine-understandable representation of its meaning --- .center[ ## Semantics representations ] - **Semantic parsing** = converting sentence into machine-understandable representation of its meaning - There exists several **Semantic Representation Schemes** --- .center[ ## Semantics representations ] - **Semantic parsing** = converting sentence into machine-understandable representation of its meaning - There exists several **Semantic Representation Schemes** - 2 main categories of SRS --- .center[ ## Semantics representations ] - **Semantic parsing** = converting sentence into machine-understandable representation of its meaning - There exists several **Semantic Representation Schemes** - 2 main categories of SRS - Logical Semantic Representations - formal representation language (lambda calculus) - or grammars - or rule-based transformations over syntactic structures --- .center[ ## Semantics representations ] - **Semantic parsing** = converting sentence into machine-understandable representation of its meaning - There exists several **Semantic Representation Schemes** - 2 main categories of SRS - Logical Semantic Representations - formal representation language (lambda calculus) - or grammars - or rule-based transformations over syntactic structures - Shallow Semantic Representations - identifies entities in a sentence and assign them a role - = slot-filling or frame-semantic parsing - FrameNet, PropBank, UCCA, AMR... See (Abend; Rappoport 2017) --- .center[ ## FrameNet ] - C. J. Fillmore, 2006 --- .center[ ## FrameNet ] - C. J. Fillmore, 2006 - Based on the Frame Semantics theory (C. Baker C. Fillmore, Lowe, 1998) --- .center[ ## FrameNet ] - C. J. Fillmore, 2006 - Based on the Frame Semantics theory (C. Baker C. Fillmore, Lowe, 1998) - FrameNet project: - Dev by ICSI Berkeley - https://framenet.icsi.berkeley.edu/fndrupal/ - 1224 Frames with - list of (lemma,POS) = **lexical units** - 13640 LU with their frames - LU = mostly verb and action nouns + a few adj, prep and adv - list of **frame elements**, core or non-core - Average of 10 FE per frame - 1878 Frame-Frame relations = similar semantic situations - 10725 FE-FE relations --- .center[ ## FrameNet ]
--- .center[ ## FrameNet ] - FrameNet has been used in: - Question-answering - Information extraction - Paraphrasing - Textual entailment detection - Other languages: - French - Portuguese - German - Spanish - Japanese - Multilingual ? (Torrent, Borin, Baker, 2018) --- .center[ ## How to parse with FrameNet ? ] 4 subtasks: - Trigger identification -> LU - Frame identification -> Frame - Argument segmentation -> FE spans - Argument identification -> FE roles --- .center[ ## PropBank ] - also known as **Semantic Role Labeling** - Palmer, Gildea, Kingsbury, 2005 - = Coarse-grained version of FrameNet --- .center[ ## PropBank ] - also known as **Semantic Role Labeling** - Palmer, Gildea, Kingsbury, 2005 - = Coarse-grained version of FrameNet - Only identify the most important roles (who, when, where...) --- .center[ ## PropBank ] - also known as **Semantic Role Labeling** - Palmer, Gildea, Kingsbury, 2005 - = Coarse-grained version of FrameNet - Only identify the most important roles (who, when, where...) - Triggers = verbs, and recently nouns --- .center[ ## PropBank ] - also known as **Semantic Role Labeling** - Palmer, Gildea, Kingsbury, 2005 - = Coarse-grained version of FrameNet - Only identify the most important roles (who, when, where...) - Triggers = verbs, and recently nouns - Frames are often defined using VerbNet (Kipper, Korhonen, Ryant, Palmer, 2008) --- .center[ ## PropBank ] - also known as **Semantic Role Labeling** - Palmer, Gildea, Kingsbury, 2005 - = Coarse-grained version of FrameNet - Only identify the most important roles (who, when, where...) - Triggers = verbs, and recently nouns - Frames are often defined using VerbNet (Kipper, Korhonen, Ryant, Palmer, 2008) - Same roles across Frames - Core A0 = agent - Core A1 = patient - Core A2 = instrument, beneficiary, attribute - Core A3 = starting point, benefactive, attribute - Core A4 = ending point - Modifiers: AM-TMP (time), AM-LOC (place), AM-PNC (purpose)... --- .center[ ## AMR ] - Abstract Meaning Representation extends PropBank to DAG - 1 DAG per sentence that represents named entities, coreference, semantic relations, temporal entities... --- .center[ ## UCCA ] - = cross-lingual semantic representation scheme - English, French, German (Czech, Russian, Hebrew) - Sentence semantics = DAG - Leafs = tokens - Non-leaf nodes = semantic units - Edges = semantic roles --- .center[ ## WordNet ] - = Lexical database for English: - Synset = group of words share same concept (gloss) - gloss = definition of a synset - Relations between synsets (hyponym...) - Navigate the graph: http://wordnetweb.princeton.edu/perl/webwn - More info, downloads: https://wordnet.princeton.edu/ --- .center[ ## WordNet ] Exercice: "zoo" has only one sense, so one lemma, which belongs to the synset {menagerie, zoo, zoological garden} - Find the gloss of "zoo" - How many synsets are there for "WordNet" ? - What synonyms does the word "table" have ? --- .center[ ## WordNet in python ] ``` from nltk.corpus import wordnet as wn zoo_synsets = wn.synsets("zoo") len(zoo_synsets) ``` --- .center[ ## WordNet in python ] Exercice: - Are WordNet synsets implemented as sets of lemmas ? (use method type()) - Does lemmas() function return a set of lemmas ? --- .center[ ## WordNet in python ] Exercice: - Are WordNet synsets implemented as sets of lemmas ? (use method type()) - No: special type SynSet - Does lemmas() function return a set of lemmas ? - No: list of lemmas --- .center[ ## Filtering by Part-Of-Speech ] ``` wn.synsets("try",pos=wn.NOUN) wn.synsets("try",pos=wn.VERB) wn.synsets("dry",pos=wn.NOUN) wn.synsets("dry",pos=wn.VERB) wn.synsets("dry",pos=wn.ADJ) ``` --- .center[ ## Some synset functions ] ``` d1 = wn.synsets("dry",pos=wn.ADJ)[0] d1.name() d1.lemmas() d1.definition() d1.examples() ``` --- .center[ ## Some synset functions ] Exercice: - What python types do functions lemmas, definition and examples produce ? - Using python, print definitions for all synsets of "dry" --- .center[ ## Some synset functions ] Exercice: - What python types do functions lemmas, definition and examples produce ? - list of lemmas, string, list of string - Using python, print definitions for all synsets of "dry" ``` for d in wn.synsets("dry"): print(d.definition()) ``` --- .center[ ## Synset names ] - Synsets are identified by a string in the format *lemma.pos.num* ``` wn.synset("zoo.n.01") wn.synset("menagerie.n.02") ``` - What is the difference between the two ? --- .center[ ## Lemma functions ] ``` wn.lemma("dry.a.01.dry").antonyms() wn.lemma("dry.a.01.dry").name() wn.lemma("dry.a.01.dry").count() wn.lemma("dry.a.01.dry").derivationally_related_forms() ``` --- .center[ ## Exercice ] - From python, get frequencies for the first lemmas of adjectives - *dry* and its antonym - *good* and its antonym - *warm* and its antonym - Define function ant_freq(x) that returns the frequencey of the antonym of a lemma --- .center[ ## Exercice ] - From python, get frequencies for the first lemmas of adjectives - *dry* and its antonym ``` d=wn.lemma('dry.a.01.dry') print(d.count(),d.antonyms()[0].count()) ``` - Define function ant_freq(x) that returns the frequencey of the antonym of a lemma ``` def ant_freq(x): return x.antonyms()[0].count() ``` --- .center[ ## Exercice ] - Print the count of all the lemmas of *wet* along with their definitions - Define a function *frequent_synsets* that outputs a list of synsets for a word that have positive frequencies --- .center[ ## Exercice ] - Print the count of all the lemmas of *wet* along with their definitions ``` for sy in wn.synsets('wet'): print(wn.lemma(sy.name()+".wet").count(),sy.definition()) ``` - Define a function *frequent_synsets* that outputs a list of synsets for a word that have positive frequencies ``` def is_frequent(synset,word): res=False for lem in synset.lemmas(): if lem.name()==word and lem.count()>0: res=True return res ``` --- .center[ ## Basic Synset relations ] ``` cat = wn.synset('cat.n.01') man = wn.synset('man.n.01') cat.hypernyms() cat.root_hypernyms() man.hyponyms() ``` --- .center[ ## Basic Lemma relations ] Compare: ``` wn.lemma('dry.a.01.dry').antonyms() wn.synset('cat.n.01').hypernyms() ``` --- .center[ ## Lexicographer files ] - synsets are grouped into files: https://wordnet.princeton.edu/documentation/lexnames5wn ``` 00 adj.all all adjective clusters 01 adj.pert relational adjectives (pertainyms) 02 adv.all all adverbs 03 noun.Tops unique beginner for nouns 04 noun.act nouns denoting acts or actions 05 noun.animal nouns denoting animals 06 noun.artifact nouns denoting man-made objects … ``` --- .center[ ## Lexicographer files ] - Query which file a synset is in: ``` man.lexname() cat.lexname() ``` - Exercice: define function *same_file(x,y)* that checks whether synsets x,y belong to the same lexi. file --- .center[ ## Lexicographer files ] - Query which file a synset is in: ``` man.lexname() cat.lexname() ``` - Exercice: define function *same_file(x,y)* that checks whether synsets x,y belong to the same lexi. file - def same_file(x,y): return x.lexname()==y.lexname() --- .center[ ## Hierarchy ] ``` cat.min_depth() cat.max_depth() ``` --- .center[ ## Hierarchy ] ``` cat.hypernym_paths() cat.hypernym_distances() ``` Sort these distances: ``` sec = lambda x: x[1] sorted(cat.hypernym_distances(),key=sec) ``` --- .center[ ## Hierarchy ] ``` cat.lowest_common_hypernyms(man) cat.common_hypernyms(man) ``` --- .center[ ## Synset Comparisons ] ``` cat.path_similarity(man) ``` - gives 1/(1+L) with L=length of the path btw cat and man - Other measures: ``` cat.wup_similarity(man) cat.lch_similarity(man) cat.shortest_path_distance(man) ``` --- .center[ ## Beyond direct neighbours ] - Recursive closure: ``` hypo = lambda x: x.hyponyms() for x in cat.closure(hypo): print(x) ``` --- .center[ ## Beyond direct neighbours ] ``` hypo = lambda x: x.hyponyms() cat.tree(hypo) cat.tree(hypo, depth=2) ``` --- .center[ ## Exercice ] 1. Define a function that returns frequent hypernym lemmas 2. Define a function that returns hypernyms of a synset but not too general ones (min depth of 5) 3. Define a function that returns common hypernyms of two synsets but not too general ones (min depth of 5) 4. Define a function *neighbors(synset,k,m)* that returns the top k nearest neighbors of synset according to similarity metric m --- .center[ ## Exercice ] 1) Define a function that returns frequent hypernym lemmas ``` def freqhyper(x): cands = [z for h in x.hypernyms() for z in h.lemmas()] return [z for z in cands if z.count()>0] ``` --- .center[ ## Exercice ] 2) Define a function that returns hypernyms of a synset but not too general ones (min depth of 5) ``` def specific_hypernyms(x): hyper=lambda x: x.hypernyms() cands = x.closure(hyper) return [h for h in cands if h.min_depth()>=5] ``` --- .center[ ## Exercice ] 3) Define a function that returns common hypernyms of two synsets but not too general ones (min depth of 5) ``` def specific_common_hypernyms(x,y): return [z for z in x.common_hypernyms(y) if z.min_depth()>=5] ``` --- .center[ ## Exercice ] 4) Define a function *neighbors(synset,k,m)* that returns the top k nearest neighbors of synset according to similarity metric m ``` def neighbors(synset,k,m): hypernyms=synset.closure(lambda z:z.hypernyms(),k) candidates=[] for y in synset.closure(lambda z:z.hyponyms(),k): candidates.append(y) for w in hypernyms: for y in w.closure(lambda z:z.hyponyms(),k): candidates.append(y) candidates=list(set(candidates)) candidates=sorted(candidates,key=lambda w:m(synset,w),reverse=True) return candidates[1:k+1] neighbors(cat,5,lambda x,y: x.wup_similarity(y)) ``` --- .center[ ## Exercice ] Homework: define a new function that takes a synset x as an argument and returns cohyponyms of x = synsets that share a hypernymwith x - Make sure x and its hyponyms are excluded - Add a numeric argument to your function that specifies the minimum similarity between the hypernyms --- --- name: last-page class: middle, center, inverse ## That's all folks (for now)! Slideshow created using [remark](http://github.com/gnab/remark).