Specialising Distributional Vectors of All Words for Lexical Entailment
Accepted version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
Semantic specialization methods fine-tune distributional word vectors using lexical knowledge from external resources (e.g., WordNet) to accentuate a particular relation between words. However, such post-processing methods suffer from limited coverage as they affect only vectors of words \textit{seen} in the external resources. We present the first post-processing method that specializes vectors of \textit{all vocabulary words} -- including those \textit{unseen} in the resources -- for the \textit{asymmetric} relation of lexical entailment (\textsc{le}) (i.e., hyponymy-hypernymy relation). Leveraging a partially \textsc{le}-specialized distributional space, our \textsc{postle} (i.e., \textit{post-specialization} for \textsc{le}) model learns an explicit global specialization function, allowing for specialization of vectors of unseen words, as well as word vectors from other languages via cross-lingual transfer. We capture the function as a deep feed-forward neural network: its objective re-scales vector norms to reflect the concept hierarchy while simultaneously attracting hyponymy-hypernymy pairs to better reflect semantic similarity. An extended model variant augments the basic architecture with an adversarial discriminator. We demonstrate the usefulness and versatility of \textsc{postle} models with different input distributional spaces in different scenarios (monolingual \textsc{le} and zero-shot cross-lingual \textsc{le} transfer) and tasks (binary and graded \textsc{le}). We report consistent gains over state-of-the-art \textsc{le}-specialization methods, and successfully \textsc{le}-specialize word vectors for languages without any external lexical knowledge.