class: center, middle background-image:url(images/data-background-light.jpg) # Lexical Resources ## Master TAL, Nancy, 2019-2020 #### Christophe cerisara .footnote[.bold[cerisara@loria.fr - CNRS / LORIA]] --- .center[ ## TP on Word Embeddings ] - Mark /5 - I've sent an email with the mark to everyone from which I have received something: if you've not received any email, and you've sent something, please react before friday ! --- .center[ ## TP on Word Embeddings ] - Classic error 1: bad models (overfitting) - subwords -> good for OOV, but overfitting is easier - "information", "informations" - In Embedding space, "rare" words will occur in arbitrary regions --- .center[ ## TP on Word Embeddings ] - Classic error 2: train on test - **NEVER** !! - Power of transfer learning - Good data is more data --- .center[ ## FrameNet ] - Original theory: C. J. Fillmore, 1976 --- .center[ ## FrameNet ] - Original theory: C. J. Fillmore, 1976 - FrameNet project: - Dev by ICSI Berkeley - Online Demo: - https://framenet.icsi.berkeley.edu/fndrupal/ - Detailed documentation: - https://framenet2.icsi.berkeley.edu/docs/r1.7/book.pdf --- .center[ ## Using FrameNet ] - Getting started: ``` from nltk.corpus import framenet as fn ``` --- .center[ ## FrameNet basic notions ] - **Frame**: official definition - "Script-like conceptual structure that describes a particular type of situation, object, or event along with the participants and props that are needed for that Frame" ``` fn.frames() fn.frame('Motion') f=fn.frame(7) ``` --- .center[ ## FrameNet basic notions ] - **Lexical Unit** LU ~= lemma in WordNet: word taken in a specific meaning - LU evoke frames, e.g. Commerce_buy evoked by: - buy.v - buyer.n - client.n - purchase [act].n - purchase.v - purchaser.n --- .center[ ## FrameNet basic notions ] - **Lexical Unit** LU ~= lemma in WordNet: word taken in a specific meaning ``` fn.lus() fn.lu(4896) ``` --- .center[ ## FrameNet basic notions ] - **frame elements** (FE) = roles in a frame ``` f.FE ``` --- .center[ ## FrameNet basic notions ] - **Relations** between frames: - Inheritance - Using: The child frame presupposes the parent frame as background, e.g the "Speed" frame "uses" (or presupposes) the "Motion" frame - Subframe: The child frame is a subevent of a complex event represented by the parent, e.g. the "Criminal_process" frame has subframes of "Arrest", "Arraignment", "Trial", and "Sentencing". - … ``` f.frameRelations ``` --- .center[ ## FrameNet basic notions ] - **Relations** between frame elements: fn.fe_relations()
--- .center[ ## FrameNet basic notions ] - Data type - FrameNet has chosen to implement averything with a mapping "keys" -> "values" - look at type(f.frameRelations[0]) - look at type(fn.fe_relations()[0]) - So to know which type of relation you're manipulating: ``` f.frameRelations[0]._type fn.fe_relations()[0]._type ``` --- .center[ ## FrameNet corpora ] - Full-text annotations = part of SemEval-07 shared task 19. - Texts from journals: 14k diff words, 4020 sentences with LU --- .center[ ## FrameNet corpora ] - Full-text annotations = part of SemEval-07 shared task 19. - Texts from journals: 14k diff words, 4020 sentences with LU - Exemplary sentences with partial annotations - Crafted texts: 132k diff words, 141k sentences --- .center[ ## FrameNet corpora ] - list all documents: ``` fn.docs() ``` - list all documents metadata: ``` fn.docs_metadata() ``` - access a specific document: ``` fn.doc(id) ``` --- .center[ ## FrameNet corpora ] - access a sentence in a document, and its annotations: ``` fn.doc(6).sentence[0] fn.doc(6).sentence[0].annotationSet[0] ``` --- .center[ ## FrameNet corpora ] - get all sentences with a frame: ``` fn.exemplars(frame='Motion') ``` --- .center[ ## FrameNet corpora ] - get all sentences with a lexical unit: ``` fn.exemplars('run') ``` --- .center[ ## Exercice ] Find the sentence and its annotations:
--- .center[ ## Exercice ] ``` a=[x for x in fn.exemplars('waltz') if 'elbowing' in x.text] ``` --- .center[ ## Searching ] - Search in frames: ``` fn.frames('Motion') ``` - Is it case sensitive ? - Use regular expressions --- .center[ ## Searching ] - Search in lexical units: ``` fn.lus('xpress') ``` --- .center[ ## Searching ] - Retrieving a lexical unit by ID: ``` g=fn.lu(5372) g.ID g.definition g.name ``` - Get the frame evoked by a LU: ``` g.frame g.frame.name ``` --- .center[ ## Exercice ] - How many frames are there in your version of FrameNet ? - How many lexical units ? --- .center[ ## Exercice ] - How many frames are there in your version of FrameNet ? ``` len(fn.frames()) ``` - How many lexical units ? ``` len(fn.lus()) ``` --- .center[ ## Getting help ] - Google search is very good - When offline: ``` help(fn) ``` --- .center[ ## Exercice ] List the names of the frames that are evoked by the LU run.v --- .center[ ## Exercice ] List the names of the frames that are evoked by the LU run.v ``` [x.frame.name for x in fn.lus('run') if x.name=='run.v'] ``` --- .center[ ## Exercice ] - Using the help() function, what is the name of the attributes describing the lemma of a LU ? --- .center[ ## Exercice ] - Find all LUs that share the frame with 'car' - Print them along with their definition - Find all frames whose name includes "contain" (first lettre capitalized or not) --- .center[ ## Exercice ] - Find all LUs that share the frame with 'car' ``` lus = fn.lus('^car.n')[0].frame.lexUnit lus = [lus[lu] for lu in lus] ``` --- .center[ ## Exercice ] - Print them along with their definition ``` for u in lus: print(lu.name,lu.definition) ``` --- .center[ ## Exercice ] - Find all frames whose name includes "contain" (first lettre capitalized or not) ``` fn.frames('[Cc]ontain') ``` --- .center[ ## Exercice ] - Useful attributes of a frame: - name, definition, FE, lexUnit, frameRelations - What relations does the Giving frame have ? - Print the FE of Giving - Print the core FE of Giving --- .center[ ## Exercice ] - What relations does the Giving frame have ? ``` give = fn.frame(139) give.frameRelations ``` - Print the FE of Giving ``` print(give.FE) ``` - Print the core FE of Giving ``` for id in give.FE: fe=give.FE(id) if fe.coreType=='Core': print(fe.name) ``` --- .center[ ## FrameNet, WordNet, VerbNet ] From Shi et al. 2005: - FrameNet provides a good generalization across predicates using frames and semantic roles - FrameNet does not explicitly define selectional restrictions for semantic roles - VerbNet and defines syntactic-semantic relationsin a more explicit way - One of the most useful properties of WordNet is its almost complete coverage of English verbs, and the rich information it encodes about semantic relations between verb senses --- .center[ ## ConcepNet ] - Integrates information from many sources: Wictionary, Dbpedia, Wordnet, etc. - Online demo: - http://conceptnet.io - Code: - https://github.com/commonsense/conceptnet5 --- .center[ ## ConcepNet ] - Network of words joined by relations (>30): - RelatedTo (Stick - branch) - FormOf (sticks - stick) - Location (stick - forest) - PartOf (stick-ice hockey) - Antonyms (stick - automatic) - UsedFor (stick - force) - … --- .center[ ## VerbNet ] - Description of Argument structures of verbs - Theoretical basis: Levin B. (1993), Univ. of Chicago - Basic unit = verb class - Different verbs in the same class have the same arguments structure --- .center[ ## VerbNet ] Verb classes - Each class characterized by: - A set of thematic roles (participants to the events) - Restrictions on role fillers (for participants) - Syntactic frames (how participants can be expressed syntactically) --- .center[ ## VerbNet ] Example - Class *Hit-18.1* (bang, bash, hit, kick...) - Roles and restrictions: | Paula hit | the ball | with a stick | |----------------|-------------|--------------| | Agent | Patient | Instrument | | [+int_control] | [+concrete] | [+concrete] | (intentional_control) --- .center[ ## VerbNet ] Thematic roles - **Agent**: subject of action - **Patient**: participant undergoing an action - **Theme**: Undergoer that is central to an event or state that does not have control over the way the event occurs, is not structurally changed by the event, and/or is characterized as being in a certain position or condition throughout the state. - **Instrument** - **Beneficiary** - ... (see https://verbs.colorado.edu/verb-index/vn3.3/themroles/ ) --- .center[ ## VerbNet ] Selectional restriction classes
--- .center[ ## VerbNet ] VerbNet frames - In VerbNet, syntactic frames provide mappings between the verb's arguments and their syntactic expression Example for **collaborate**: Frame 1: | **NP V PP.theme** | "They collaborated on the task" | |-------------------|--------------------------------------------| | *syntax* | Agent <+plural> V {on} Theme <-sentential> | Frame 2: | **NP V PP.theme S_ING** | "They collaborated in finishing the task" | |-------------------------|-------------------------------------------| | *syntax* | Agent <+plural> V {in} Theme <+sc_ing> | --- .center[ ## VerbNet ] ## VerbNet in python Must use python3 ! ``` import nltk nltk.download('verbnet') from nltk.corpus import verbnet as vn ``` --- .center[ ## VerbNet ] List all lemmas ``` vn.lemmas() ``` List VerbNet classes for a lemma: ``` vn.classids(lemma='withdraw') ``` Exercise: write a function that returns whether classes of two lemmas are the same, distinct, or overlap --- .center[ ## VerbNet ] Retrieving a class ``` withdraw = vn.vnclass('withdraw-82-1') ``` Get lemmas for a class ``` vn.lemmas(vnclass=withdraw) ``` Exercise: write a function vn_neighbors(v) that returns a list of lemmas that share verbnet classes of the verb v --- .center[ ## VerbNet ] Frames of a class ``` vn.frames(withdraw) ``` Exercise: How many syntactic frames are listed in VerbNet ? --- .center[ ## VerbNet ] Frame attributes ``` wd1 = vn.frames(withdraw)[0] wd1['description'] wd1['example'] wd1['syntax'] wd1['semantic'] ``` --- .center[ ## VerbNet ] Exercise - How many syntactic frames in VerbNet have a distinct syntax ? - Extract a list of verbs that can be used in the Basic intransitive syntactic frame (i.e., subject NP + verb). Hint: look at frame descriptions --- --- name: last-page class: middle, center, inverse ## That's all folks (for now)! Slideshow created using [remark](http://github.com/gnab/remark).