NLP Pipeline for Annotating (Endangered) Tibetan and Newar Varieties
View / Open Files
Journal Title
Proceedings of the EURALI workshop at LREC 2022
Conference Name
LREC-EURALI workshop
Type
Conference Object
This Version
AM
Metadata
Show full item recordCitation
Faggionato, C., Hill, N., & Meelen, M. NLP Pipeline for Annotating (Endangered) Tibetan and Newar Varieties. Proceedings of the EURALI workshop at LREC 2022 https://doi.org/10.17863/CAM.84900
Abstract
In this paper we present our work-in-progress on a fully-implemented pipeline to create deeply-annotated corpora of a number of historical and contemporary Tibetan and Newar varieties. Our off-the-shelf tools allow researchers to create corpora with five different layers of annotation, ranging from morphosyntactic to information-structural annotation. We build on and optimise existing tools (in line with FAIR principles), as well as develop new ones, and show how they can be adapted to other Tibetan and Newar languages, most notably modern endangered languages that are both extremely low-resourced and under-researched.
Sponsorship
This research is AHRC-funded (AH/V011235/1).
Embargo Lift Date
2023-05-25
Identifiers
External DOI: https://doi.org/10.17863/CAM.84900
This record's URL: https://www.repository.cam.ac.uk/handle/1810/337486
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: support@repository.cam.ac.uk