Repository logo
 

Lemmatization for Ancient Greek: An experimental assessment of the state of the art

Published version
Peer-reviewed

Type

Article

Change log

Authors

McGillivray, Barbara  ORCID logo  https://orcid.org/0000-0003-3426-8200

Abstract

This short article presents the result of accuracy tests for currently available Ancient Greek lemmatizers and recently published lemmatized corpora. We ran a blinded experiment in which three highly proficient readers of Ancient Greek evaluated the output of the CLTK lemmatizer, of the CLTK backoff lemmatizer, and of GLEM, together with the lemmatizations offered by the Diorisis corpus and the Lemmatized Ancient Greek Texts repository. The texts chosen for this experiment are Homer, Iliad 1.1–279 and Lysias 7. The results suggest that lemmatization methods using large lexica as well as part-of-speech tagging—such as those employed by the Diorisis corpus and the CLTK backoff lemmatizer—are more reliable than methods that rely more heavily on machine learning and use smaller lexica.

Description

Keywords

47 Language, Communication and Culture, 4705 Literary Studies

Journal Title

Journal of Greek Linguistics

Conference Name

Journal ISSN

1566-5844
1569-9846

Volume Title

20

Publisher

Brill
Sponsorship
Alan Turing Institute (EP/N510129/1)