Old Catalan Morphosyntax: Developing an Annotated Corpus
View / Open Files
Journal Title
Journal of Open Humanities Data
ISSN
2059-481X
Publisher
Ubiquity Press
Type
Article
This Version
AM
Metadata
Show full item recordCitation
Meelen, M., & Pujol I Campeny, A. Old Catalan Morphosyntax: Developing an Annotated Corpus. Journal of Open Humanities Data https://doi.org/10.17863/CAM.79124
Abstract
This paper presents a full procedure for the development of a Part-of-Speech (POS) tagged corpus of Old Catalan. As an extremely low-resource language with rich inflection and frequent homographs, Old Catalan poses non-trivial problems in the development of a searchable constituency-based treebank. We demonstrate, however, that a semi-supervised method of incrementally building training data using both neural and memory-based taggers, together with the Pyrrha annotation tool is highly efficient and yields accurate results. We propose that this simple and effective method could easily be extended to other low-resource historical languages for which no NLP tools exist yet.
Sponsorship
"Research that partially facilitated the work presented in this article was funded by the British Academy (PDF grant pf170063), and the Cambridge Humanities Research Grant (tier 1 grant, GANT011262). Additionally, this work has been supported by the French government, through the UCAJEDI Investments in the Future project managed by the National Research Agency (ANR) with the reference number C870A06228 – EOTP : SYVACA – D112.
Funder references
British Academy (PF170063)
British Academy (SRG18R1\181450)
Embargo Lift Date
2024-12-21
Identifiers
This record's DOI: https://doi.org/10.17863/CAM.79124
This record's URL: https://www.repository.cam.ac.uk/handle/1810/331671
Statistics
Total file downloads (since January 2020). For more information on metrics see the
IRUS guide.
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: support@repository.cam.ac.uk