Show simple item record

dc.contributor.authorLitschko, R
dc.contributor.authorVulić, I
dc.contributor.authorPonzetto, SP
dc.contributor.authorGlavaš, G
dc.date.accessioned2021-09-02T10:51:01Z
dc.date.available2021-09-02T10:51:01Z
dc.date.issued2021
dc.identifier.isbn9783030721121
dc.identifier.issn0302-9743
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/327496
dc.description.abstractPretrained multilingual text encoders based on neural Transformer architectures, such as multilingual BERT (mBERT) and XLM, have achieved strong performance on a myriad of language understanding tasks. Consequently, they have been adopted as a go-to paradigm for multilingual and cross-lingual representation learning and transfer, rendering cross-lingual word embeddings (CLWEs) effectively obsolete. However, questions remain to which extent this finding generalizes 1) to unsupervised settings and 2) for ad-hoc cross-lingual IR (CLIR) tasks. Therefore, in this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a large number of language pairs. In contrast to supervised language understanding, our results indicate that for unsupervised document-level CLIR -- a setup with no relevance judgments for IR-specific fine-tuning -- pretrained encoders fail to significantly outperform models based on CLWEs. For sentence-level CLIR, we demonstrate that state-of-the-art performance can be achieved. However, the peak performance is not met using the general-purpose multilingual text encoders "off-the-shelf", but rather relying on their variants that have been further specialized for sentence understanding tasks
dc.publisherSpringer International Publishing
dc.rightsAll rights reserved
dc.titleEvaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrieval
dc.typeConference Object
prism.endingPage358
prism.publicationDate2021
prism.publicationNameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
prism.startingPage342
prism.volume12656 LNCS
dc.identifier.doi10.17863/CAM.74949
dcterms.dateAccepted2020-12-16
rioxxterms.versionofrecord10.1007/978-3-030-72113-8_23
rioxxterms.versionAM
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserved
rioxxterms.licenseref.startdate2021-01-01
dc.identifier.eissn1611-3349
rioxxterms.typeConference Paper/Proceeding/Abstract
pubs.funder-project-idEuropean Research Council (648909)
cam.issuedOnline2021-03-27
pubs.conference-nameProceedings of the 43rd European Conference on Information Retrieval (ECIR 2021)
cam.orpheus.successMon Sep 06 07:30:36 BST 2021 - Embargo updated
rioxxterms.freetoread.startdate2022-01-01


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record