NLP Pipeline for Annotating (Endangered) Tibetan and Newar Varieties
Proceedings of the EURALI workshop at LREC 2022
MetadataShow full item record
Faggionato, C., Hill, N., & Meelen, M. NLP Pipeline for Annotating (Endangered) Tibetan and Newar Varieties. Proceedings of the EURALI workshop at LREC 2022 https://doi.org/10.17863/CAM.84900
In this paper we present our work-in-progress on a fully-implemented pipeline to create deeply-annotated corpora of a number of historical and contemporary Tibetan and Newar varieties. Our off-the-shelf tools allow researchers to create corpora with five different layers of annotation, ranging from morphosyntactic to information-structural annotation. We build on and optimise existing tools (in line with FAIR principles), as well as develop new ones, and show how they can be adapted to other Tibetan and Newar languages, most notably modern endangered languages that are both extremely low-resourced and under-researched.
This research is AHRC-funded (AH/V011235/1).
Embargo Lift Date
External DOI: https://doi.org/10.17863/CAM.84900
This record's URL: https://www.repository.cam.ac.uk/handle/1810/337486
All Rights Reserved
Licence URL: http://www.rioxx.net/licenses/all-rights-reserved