Repository logo
 

Word embeddings for biomedical natural language processing: A survey

Accepted version
Peer-reviewed

Type

Article

Change log

Authors

Abstract

jats:titleAbstract</jats:title>jats:pWord representations are mathematical objects that capture the semantic and syntactic properties of words in a way that is interpretable by machines. Recently, encoding word properties into low‐dimensional vector spaces using neural networks has become increasingly popular. Word embeddings are now used as the main input to natural language processing (NLP) applications, achieving cutting‐edge results. Nevertheless, most word‐embedding studies are carried out with general‐domain text and evaluation datasets, and their results do not necessarily apply to text from other domains (e.g., biomedicine) that are linguistically distinct from general English. To achieve maximum benefit when using word embeddings for biomedical NLP tasks, they need to be induced and evaluated using in‐domain resources. Thus, it is essential to create a detailed review of biomedical embeddings that can be used as a reference for researchers to train in‐domain models. In this paper, we review biomedical word embedding studies from three key aspects: the corpora, models and evaluation methods. We first describe the characteristics of various biomedical corpora, and then compare popular embedding models. After that, we discuss different evaluation methods for biomedical embeddings. For each aspect, we summarize the various challenges discussed in the literature. Finally, we conclude the paper by proposing future directions that will help advance research into biomedical embeddings.</jats:p>

Description

Keywords

biomedical NLP, evaluation, word embeddings

Journal Title

Language and Linguistics Compass

Conference Name

Journal ISSN

1749-818X
1749-818X

Volume Title

14

Publisher

Wiley

Rights

All rights reserved