Zero-shot language transfer for cross-lingual sentence retrieval using bidirectional attention model

Glavaš, G; Vulić, I

Zero-shot language transfer for cross-lingual sentence retrieval using bidirectional attention model

Accepted version

Peer-reviewed

Repository URI

https://www.repository.cam.ac.uk/handle/1810/290531

Repository DOI

https://doi.org/10.17863/CAM.37761

Files

Accepted version (729.1 KB)

Type

Conference Object

Authors

Glavaš, G

Vulić, I

Abstract

We present a neural architecture for cross-lingual mate sentence retrieval which encodes sentences in a joint multilingual space and learns to distinguish true translation pairs from semantically related sentences across languages. The proposed model combines a recurrent sequence encoder with a bidirectional attention layer and an intra-sentence attention mechanism. This way the final fixed-size sentence representations in each training sentence pair depend on the selection of contextualized token representations from the other sentence. The representations of both sentences are then combined using the bilinear product function to predict the relevance score. We show that, coupled with a shared multilingual word embedding space, the proposed model strongly outperforms unsupervised cross-lingual ranking functions, and that further boosts can be achieved by combining the two approaches. Most importantly, we demonstrate the model's effectiveness in zero-shot language transfer settings: our multilingual framework boosts cross-lingual sentence retrieval performance for unseen language pairs without any training examples. This enables robust cross-lingual sentence retrieval also for pairs of resource-lean languages, without any parallel data.

Keywords

4605 Data Management and Data Science, 46 Information and Computing Sciences, 4611 Machine Learning, Clinical Research, 1.1 Normal biological development and functioning, 1 Underpinning research

Journal Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Conference Name

Proceedings of the 41st European Conference on Information Retrieval (ECIR 2019)

Journal ISSN

0302-9743
1611-3349

Volume Title

11437 LNCS

Publisher

Springer International Publishing

Publisher DOI

https://doi.org/10.1007/978-3-030-15712-8_34

Rights

Sponsorship

European Research Council (648909)

Collections

Cambridge University Research Outputs