Zero-shot language transfer for cross-lingual sentence retrieval using bidirectional attention model
View / Open Files
Authors
Glavaš, G
Vulić, I
Publication Date
2019Journal Title
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Conference Name
Proceedings of the 41st European Conference on Information Retrieval (ECIR 2019)
ISSN
0302-9743
ISBN
9783030157111
Publisher
Springer International Publishing
Volume
11437 LNCS
Pages
523-538
Type
Conference Object
This Version
AM
Metadata
Show full item recordCitation
Glavaš, G., & Vulić, I. (2019). Zero-shot language transfer for cross-lingual sentence retrieval using bidirectional attention model. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11437 LNCS 523-538. https://doi.org/10.1007/978-3-030-15712-8_34
Abstract
We present a neural architecture for cross-lingual mate sentence retrieval which encodes sentences in a joint multilingual space and learns to distinguish true translation pairs from semantically related sentences across languages. The proposed model combines a recurrent sequence encoder with a bidirectional attention layer and an intra-sentence attention mechanism. This way the final fixed-size sentence representations in each training sentence pair depend on the selection of contextualized token representations from the other sentence. The representations of both sentences are then combined using the bilinear product function to predict the relevance score. We show that, coupled with a shared
multilingual word embedding space, the proposed model strongly outperforms unsupervised cross-lingual ranking functions, and that further boosts can be achieved by combining the two approaches. Most importantly, we demonstrate the model's effectiveness in zero-shot language transfer settings: our multilingual framework boosts cross-lingual sentence retrieval performance for unseen language pairs without any training examples. This enables robust cross-lingual sentence retrieval
also for pairs of resource-lean languages, without any parallel data.
Sponsorship
European Research Council (648909)
Identifiers
External DOI: https://doi.org/10.1007/978-3-030-15712-8_34
This record's URL: https://www.repository.cam.ac.uk/handle/1810/290531
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.