A deep learning approach to bilingual lexicon induction in the biomedical domain

Heyman, Geert; Vulić, Ivan; Moens, Marie-Francine

doi:10.1186/s12859-018-2245-8

A deep learning approach to bilingual lexicon induction in the biomedical domain

Repository URI

https://www.repository.cam.ac.uk/handle/1810/277925

Repository DOI

https://doi.org/10.17863/CAM.25260

Files

12859_2018_Article_2245.pdf (1.9 MB)

Type

Journal Article

Authors

Heyman, Geert

Vulić, Ivan

Moens, Marie-Francine

Abstract

            Background
            Bilingual lexicon induction (BLI) is an important task in the biomedical domain as translation resources are usually available for general language usage, but are often lacking in domain-specific settings. In this article we consider BLI as a classification problem and train a neural network composed of a combination of recurrent long short-term memory and deep feed-forward networks in order to obtain word-level and character-level representations.
          
          
            Results
            The results show that the word-level and character-level representations each improve state-of-the-art results for BLI and biomedical translation mining. The best results are obtained by exploiting the synergy between these word-level and character-level representations in the classification model. We evaluate the models both quantitatively and qualitatively.
          
          
            Conclusions
            Translation of domain-specific biomedical terminology benefits from the character-level representations compared to relying solely on word-level representations. It is beneficial to take a deep learning approach and learn character-level representations rather than relying on handcrafted representations that are typically used. Our combined model captures the semantics at the word level while also taking into account that specialized terminology often originates from a common root form (e.g., from Greek or Latin).

Publisher DOI

https://doi.org/10.1186/s12859-018-2245-8

Rights

Collections

BioMed Central Publications

A deep learning approach to bilingual lexicon induction in the biomedical domain

Repository URI

Repository DOI

Files

Type

Change log

Authors

Abstract

Description

Keywords

Is Part Of

Publisher

Publisher DOI

Rights

Collections