Show simple item record

dc.contributor.authorVulic, I
dc.contributor.authorMrkšic, N
dc.contributor.authorReichart, R
dc.contributor.authorSéaghdha, D
dc.contributor.authorYoung, S
dc.contributor.authorKorhonen, A
dc.date.accessioned2017-06-05T12:59:01Z
dc.date.available2017-06-05T12:59:01Z
dc.date.issued2017
dc.identifier.isbn9781945626753
dc.identifier.urihttps://www.repository.cam.ac.uk/handle/1810/264637
dc.description.abstractMorphologically rich languages accentuate two properties of distributional vector space models: 1) the difficulty of inducing accurate representations for low-frequency word forms; and 2) insensitivity to distinct lexical relations that have similar distributional signatures. These effects are detrimental for language understanding systems, which may infer that inexpensive is a rephrasing for expensive or may not associate acquire with acquires. In this work, we propose a novel morph-fitting procedure which moves past the use of curated semantic lexicons for improving distributional vector spaces. Instead, our method injects morphological constraints generated using simple language-specific rules, pulling inflectional forms of the same word close together and pushing derivational antonyms far apart. In intrinsic evaluation over four languages, we show that our approach: 1) improves low-frequency word estimates; and 2) boosts the semantic quality of the entire word vector collection. Finally, we show that morph-fitted vectors yield large gains in the downstream task of dialogue state tracking, highlighting the importance of morphology for tackling long-tail phenomena in language understanding tasks.
dc.languageEnglish
dc.language.isoen
dc.publisherAssociation for Computational Linguistics
dc.rightsAttribution 4.0 International
dc.rightsAttribution 4.0 International
dc.rightsAttribution 4.0 International
dc.rightsAttribution 4.0 International
dc.rightsAttribution 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectSemantic specialisation
dc.subjectMorphologically complex languages
dc.subjectVector space models
dc.subjectDialogue state tracking
dc.subjectWord embeddings
dc.titleMorph-fitting: Fine-tuning word vector spaces with simple language-specific rules
dc.typeConference Object
prism.publicationDate2017
prism.publicationNameACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
dc.identifier.doi10.17863/CAM.10176
dcterms.dateAccepted2017-03-30
rioxxterms.versionofrecord10.18653/v1/P17-1006
rioxxterms.versionVoR
rioxxterms.licenseref.urihttp://creativecommons.org/licenses/by/4.0/
rioxxterms.licenseref.startdate2017-08-01
dc.contributor.orcidYoung, Steve [0000-0002-2319-3074]
rioxxterms.typeConference Paper/Proceeding/Abstract
pubs.funder-project-idEuropean Research Council (648909)
pubs.conference-nameProceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
pubs.conference-start-date2017-07
cam.orpheus.successThu Nov 05 11:57:25 GMT 2020 - The item has an open VoR version.
pubs.conference-finish-date2017-07
rioxxterms.freetoread.startdate2100-01-01


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution 4.0 International
Except where otherwise noted, this item's licence is described as Attribution 4.0 International