Improved semantic representation for domain-specific entities
Published version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Pilehvar, MT
Collier, Nigel https://orcid.org/0000-0002-7230-4164
Abstract
Most existing corpus-based approaches to semantic representation suffer from inaccurate modeling of domain-specific lexical items which either have low frequencies or are non-existent in open-domain corpora. We put forward a technique that improves word embeddings in specific domains by first transforming a given lexical item to a sorted list of representative words and then modeling the item by combining the embeddings of these words. Our experiments show that the proposed technique can significantly improve some of the recent word embedding techniques while modeling a set of lexical items in the biomedical domain, i.e., phenotypes.
Description
Keywords
Journal Title
BioNLP 2016 - Proceedings of the 15th Workshop on Biomedical Natural Language Processing
Conference Name
Journal ISSN
Volume Title
Publisher
Association for Computational Linguistics
Publisher DOI
Sponsorship
Medical Research Council (MR/M025160/1)
The authors gratefully acknowledge the support
of the MRC grant No. MR/M025160/1 for
PheneBank.