Segmenting and POS tagging Classical Tibetan using a Memory-Based Tagger
Published version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Meelen, Marieke https://orcid.org/0000-0003-0395-8372
Hill, Nathan
Abstract
This paper presents a new approach to two challenging NLP tasks in Classical Tibetan: word segmentation and Part-of-Speech (POS) tagging. We demonstrate how both these problems can be approached in the same way, by generating a memory-based tagger that assigns 1) segmentation tags and 2) POS tags to a test corpus consisting of unsegmented lines of Tibetan characters. We propose a three-stage workflow and evaluate the results of both the segmenting and the POS tagging tasks. We argue that the Memory-Based Tagger (MBT) and the proposed workflow not only provide an adequate solution to these NLP challenges, they are also highly efficient tools for building larger annotated corpora of Tibetan.
Description
Keywords
47 Language, Communication and Culture, 4704 Linguistics
Journal Title
Himalayan Linguistics
Conference Name
Journal ISSN
1544-7502
1544-7502
1544-7502
Volume Title
16
Publisher
University of California
Publisher DOI
Sponsorship
European Research Council (269752)
ERC grants IDs 609823 & 269752.