Repository logo
 

Mining the UK web archive for semantic change detection

Accepted version
Peer-reviewed

Type

Conference Object

Change log

Authors

Tsakalidis, A 
Bazzi, M 
Cucuringu, M 
Basile, P 
McGillivray, Barbara  ORCID logo  https://orcid.org/0000-0003-3426-8200

Abstract

Semantic change detection (i.e., identify- ing words whose meaning has changed over time) started emerging as a grow- ing area of research over the past decade, with important downstream applications in natural language processing, historical linguistics and computational social sci- ence. However, several obstacles make progress in the domain slow and diffi- cult. These pertain primarily to the lack of well-established gold standard datasets, resources to study the problem at a fine- grained temporal resolution, and quantita- tive evaluation approaches. In this work, we aim to mitigate these issues by (a) re- leasing a new labelled dataset of more than 47K word vectors trained on the UK Web Archive over a short time-frame (2000- 2013); (b) proposing a variant of Pro- crustes alignment to detect words that have undergone semantic shift; and (c) intro- ducing a rank-based approach for evalu- ation purposes. Through extensive nu- merical experiments and validation, we il- lustrate the effectiveness of our approach against competitive baselines. Finally, we also make our resources publicly available to further enable research in the domain.

Description

Keywords

Journal Title

International Conference Recent Advances in Natural Language Processing, RANLP

Conference Name

Recent Advances in Natural Language Processing

Journal ISSN

1313-8502

Volume Title

2019-September

Publisher

Incoma Ltd., Shoumen, Bulgaria

Rights

All rights reserved
Sponsorship
Alan Turing Institute (EP/N510129/1)
This work was supported by The Alan Turing In- stitute under the EPSRC grant EP/N510129/1 and the seed funding grant SF099.