Repository logo

Neural Word Representations for Biomedical NLP



Change log



Word representations are mathematical objects which capture the semantic and syntactic properties of words in a way that is interpretable by machines. Recently, the encoding of word properties into a low-dimensional vector space using neural networks has become popular. Neural representations are now used as the main input to Natural Language Processing (NLP)applications and in most areas of NLP, achieving cutting-edge results. Our work extends the usefulness of neural representations, with a particular emphasis on the biomedical domain which is linguistically highly challenging. We focus on three directions: first, we present a comprehensive study on how the quality of the representation model varies according to its training parameters. For this, we implement a set of well-established models with different training settings regarding the size of input corpora, model architectures and hyper-parameters, and evaluate them thoroughly using the standard methods. Our best model significantly outperforms the baseline one, demonstrating the high impact of training parameters and the necessity of their optimization. The study provides an important reference for researchers using neural representations for biomedical NLP. Second, we introduce two novel datasets for evaluating noun and verb representations in biomedicine. These datasets are designed to be consistent with those available for mainstream NLP. They enable, for the first time, evaluation of verb representations in the domain. Last, we propose a neural approach to facilitate the development of a VerbNet-Style classification in biomedicine: we start from a small manual classification of biomedical verbs and apply a state-of-the-art neural representation model, developed explicitly for verb optimization, to expand that classification with new members. Evaluation of the resulting resource shows promising results when representation learning is performed using verb-related contexts. Additionally, our human- and task-based evaluations reveal that the automatically-created resource is highly accurate, suggesting that our method can be used to facilitate cost-effective development of verb resources in biomedicine.





Korhonen, Anna


word embedding, biomedical natural language processing, biomedical verb intrinsic evaluation dataset


Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge