Old Catalan Morphosyntax: Developing an Annotated Corpus
Pujol I Campeny, Afra
Journal of Open Humanities Data
MetadataShow full item record
Meelen, M., & Pujol I Campeny, A. Old Catalan Morphosyntax: Developing an Annotated Corpus. Journal of Open Humanities Data https://doi.org/10.17863/CAM.79124
This paper presents a full procedure for the development of a Part-of-Speech (POS) tagged corpus of Old Catalan. As an extremely low-resource language with rich inflection and frequent homographs, Old Catalan poses non-trivial problems in the development of a searchable constituency-based treebank. We demonstrate, however, that a semi-supervised method of incrementally building training data using both neural and memory-based taggers, together with the Pyrrha annotation tool is highly efficient and yields accurate results. We propose that this simple and effective method could easily be extended to other low-resource historical languages for which no NLP tools exist yet.
"Research that partially facilitated the work presented in this article was funded by the British Academy (PDF grant pf170063), and the Cambridge Humanities Research Grant (tier 1 grant, GANT011262). Additionally, this work has been supported by the French government, through the UCAJEDI Investments in the Future project managed by the National Research Agency (ANR) with the reference number C870A06228 – EOTP : SYVACA – D112.
British Academy (PF170063)
British Academy (SRG18R1\181450)
Embargo Lift Date
This record's DOI: https://doi.org/10.17863/CAM.79124
This record's URL: https://www.repository.cam.ac.uk/handle/1810/331671
Attribution 4.0 International
Licence URL: https://creativecommons.org/licenses/by/4.0/
Recommended or similar items
The current recommendation prototype on the Apollo Repository will be turned off on 03 February 2023. Although the pilot has been fruitful for both parties, the service provider IKVA is focusing on horizon scanning products and so the recommender service can no longer be supported. We recognise the importance of recommender services in supporting research discovery and are evaluating offerings from other service providers. If you would like to offer feedback on this decision please contact us on: email@example.com