Show simple item record

dc.contributor.authorLitschko, Ren
dc.contributor.authorVulić, Ien
dc.contributor.authorPonzetto, SPen
dc.contributor.authorGlavaš, Gen
dc.date.accessioned2021-09-02T10:51:01Z
dc.date.available2021-09-02T10:51:01Z
dc.date.issued2021-01-01en
dc.identifier.isbn9783030721121en
dc.identifier.issn0302-9743
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/327496
dc.description.abstractPretrained multilingual text encoders based on neural Transformer architectures, such as multilingual BERT (mBERT) and XLM, have achieved strong performance on a myriad of language understanding tasks. Consequently, they have been adopted as a go-to paradigm for multilingual and cross-lingual representation learning and transfer, rendering cross-lingual word embeddings (CLWEs) effectively obsolete. However, questions remain to which extent this finding generalizes 1) to unsupervised settings and 2) for ad-hoc cross-lingual IR (CLIR) tasks. Therefore, in this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a large number of language pairs. In contrast to supervised language understanding, our results indicate that for unsupervised document-level CLIR -- a setup with no relevance judgments for IR-specific fine-tuning -- pretrained encoders fail to significantly outperform models based on CLWEs. For sentence-level CLIR, we demonstrate that state-of-the-art performance can be achieved. However, the peak performance is not met using the general-purpose multilingual text encoders "off-the-shelf", but rather relying on their variants that have been further specialized for sentence understanding tasks
dc.rightsAll rights reserved
dc.rights.uri
dc.titleEvaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrievalen
dc.typeConference Object
prism.endingPage358
prism.publicationDate2021en
prism.publicationNameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)en
prism.startingPage342
prism.volume12656 LNCSen
dc.identifier.doi10.17863/CAM.74949
dcterms.dateAccepted2020-12-16en
rioxxterms.versionofrecord10.1007/978-3-030-72113-8_23en
rioxxterms.versionAM
rioxxterms.licenseref.urihttp://www.rioxx.net/licenses/all-rights-reserveden
rioxxterms.licenseref.startdate2021-01-01en
dc.identifier.eissn1611-3349
rioxxterms.typeConference Paper/Proceeding/Abstracten
pubs.funder-project-idECH2020 EUROPEAN RESEARCH COUNCIL (ERC) (648909)
cam.orpheus.successMon Sep 06 07:30:36 BST 2021 - Embargo updated*
rioxxterms.freetoread.startdate2022-01-01


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record