Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction
View / Open Files
Publication Date
2016-05-25Journal Title
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)
Conference Name
LREC 2016, Tenth International Conference on Language Resources and Evaluation
Publisher
European Language Resources Association
Pages
4402-4409
Language
English
Type
Conference Object
This Version
VoR
Metadata
Show full item recordCitation
Kaplan, D., Rubens, N., Teufel, S., & Tokunaga, T. (2016). Solving the AL Chicken-and-Egg Corpus and Model Problem: Model-free Active Learning for Phenomena-driven Corpus Construction. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 4402-4409. http://www.lrec-conf.org/proceedings/lrec2016/summaries/28.html
Abstract
Active learning (AL) is often used in corpus construction (CC) for selecting “informative” documents for annotation. This is ideal for
focusing annotation efforts when all documents cannot be annotated, but has the limitation that it is carried out in a closed-loop, selecting
points that will improve an existing model. For phenomena-driven and exploratory CC, the lack of existing-models and specific task(s)
for using it make traditional AL inapplicable. In this paper we propose a novel method for model-free AL utilising characteristics of
phenomena for applying AL to select documents for annotation. The method can also supplement traditional closed-loop AL-based CC
to broaden the utility of the corpus created beyond a single task. We introduce our tool, MOVE, and show its potential with a real world
case-study.
Keywords
corpus construction, active learning, tools
Identifiers
This record's URL: https://www.repository.cam.ac.uk/handle/1810/266703
Rights
Attribution-NonCommercial 4.0 International, Attribution-NonCommercial 4.0 International, Attribution-NonCommercial 4.0 International
Recommended or similar items
The following licence files are associated with this item: