Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
Block quote
Ordered list
Unordered list
Bold text
Emphasis
Superscript
Subscript
We enrich the self-supervision strategy employed in BERT by applying self-supervision on the Word-Sense level. Our model, SenseBERT, achieves significantly improved lexical disambiguation abilities, setting state-of-the-art results on the Word in Context (WiC) task.
The use of self-supervision has allowed neural language models to advance the frontier in Natural Language Understanding. Specifically, the self-supervision strategy employed in BERT, which involves masking some of the words in an input sentence and training the model to predict them given their context, results in a versatile language model suited for many tasks. Nevertheless, existing self-supervision methods operate on the word-form level, and as such, they are inherently limited. Take the word ‘bass’ as an example: in one context it can refer to a fish, in a second to a guitar, in the third to a type of singer, and so on. The word itself is merely a surrogate of its actual meaning in a given context - referred to as its sense.
Word-Sense predictions made by SenseBERT on a raw input text.
The word "Bass" maps to its different supersenses according to context.
We propose a method to employ self-supervision directly at the word sense level. Our model, named SenseBERT, is pre-trained to predict not only the masked words (as in BERT) but also their WordNet supersenses. BERT’s current word-form level objective leads to what is commonly referred to as a language model: the model predicts words for the given masked position in a sentence. Our added task allows SenseBERT to gather word-sense level statistics and thereby we effectively train a semantic-level language model that predicts the missing word’s meaning jointly with the standard word-form level prediction.
Self-supervision allows neural language models to learn from massive amounts of unannotated text. In order to enjoy this advantage at the lexical-semantic level, we make use of WordNet - an expert-constructed ontology that provides an inventory of word senses. We focus on a coarse-grained variant of word senses, referred to as supersenses, divided into 45 types that semantically categorize the entire language.
Labeling of words with a single supersense, for example the word ‘sword’ which has only the supersense ‘artifact’, is straightforward: We train the network to predict this supersense given the masked word’s context. As for words with multiple supersenses (like the word ‘bass’ we discussed earlier), we train the model to predict any of these senses, leading to a simple yet effective labeling scheme.
Other than our additional pre-training task and the architecture changes that support it, we followed the architecture and training methods published in the BERT paper. Specifically, we trained models in two sizes, similar to BERT model sizes - SenseBERTBASE and SenseBERTLARGE. Our results below show significantly enhanced word meaning understanding abilities of SenseBERT, achieved without human annotation.
Word-Sense predictions made by SenseBERT on the masked word in the sentence.
In order to test our model’s lexical semantic abilities, we compared it with BERT on two tasks:
1. SemEval Word Superense Disambiguation
We constructed a supersense variant of the SemEval-based WSD task, where the goal is to predict a word’s supersense given context.
We trained and tested in two different schemes:
SenseBERTBASE outscores both BERTBASE and BERTLARGE models by a large margin in both cases, and SenseBERTLARGE yields even higher results.
2. Word in Context (WiC)
The Word in Context (WiC) task, from the SuperGLUE benchmark, is the task of predicting whether a word has the same meaning in two given sentences. This task heavily depends on word-supersense awareness.
SenseBERTBASE surpasses BERTLARGE, and a single SenseBERTLARGE model achieves state-of-the-art performance on this task with a score of 72.14.