Repository logo
 

Improved semantic representation for domain-specific entities

Published version
Peer-reviewed

Type

Article

Change log

Authors

Pilehvar, MT 

Abstract

Most existing corpus-based approaches to semantic representation suffer from inaccurate modeling of domain-specific lexical items which either have low frequencies or are non-existent in open-domain corpora. We put forward a technique that improves word embeddings in specific domains by first transforming a given lexical item to a sorted list of representative words and then modeling the item by combining the embeddings of these words. Our experiments show that the proposed technique can significantly improve some of the recent word embedding techniques while modeling a set of lexical items in the biomedical domain, i.e., phenotypes.

Description

Keywords

Journal Title

BioNLP 2016 - Proceedings of the 15th Workshop on Biomedical Natural Language Processing

Conference Name

Journal ISSN

Volume Title

Publisher

Association for Computational Linguistics
Sponsorship
Medical Research Council (MR/M025160/1)
The authors gratefully acknowledge the support of the MRC grant No. MR/M025160/1 for PheneBank.