Repository logo
 

Modelling the combination of generic and target domain embeddings in a convolutional neural network for sentence classification

Published version
Peer-reviewed

Change log

Authors

Limsopatham, N 

Abstract

Word embeddings have been successfully exploited in systems for NLP tasks, such as parsing and text classification. It is intuitive that word embeddings created from a larger corpus would provide a better coverage of vocabulary. Meanwhile, word embeddings trained on a corpus related to the given task or target domain would more effectively represent the semantics of terms. However, in some emerging domains (e.g. bio-surveillance using social media data), it may be difficult to find a domain corpus that is large enough for creating effective word embeddings. To deal with this problem, we propose novel approaches that use both word embeddings created from generic and target domain corpora. Our experimental results on sentence classifi- cation tasks show that our approaches significantly improve the performance of an existing convolutional neural network that achieved state-of-the-art performances on several text classification tasks.

Description

Keywords

46 Information and Computing Sciences, 4611 Machine Learning, Machine Learning and Artificial Intelligence

Journal Title

BioNLP 2016 - Proceedings of the 15th Workshop on Biomedical Natural Language Processing

Conference Name

Journal ISSN

Volume Title

W16

Publisher

Association for Computational Linguistics
Sponsorship
Engineering and Physical Sciences Research Council (Grant ID: EP/M005089/1)