Repository logo
 

Crosslinguistic Semantic Textual Similarity of Buddhist Chinese and Classical Tibetan

Published version
Peer-reviewed

Repository DOI


Type

Article

Change log

Abstract

In this paper we present the first-ever procedure for identifying highly similar sequences of text in Chinese and Tibetan translations of Buddhist sutra literature. We initially propose this procedure as an aid to scholars engaged in the philological study of Buddhist documents. We create a cross-lingual embedding space by taking the cosine similarity of average sequence vectors in order to produce unsupervised similar cross-linguistic parallel alignments at word, sentence, and even paragraph level. Initial results show that our method lays a solid foundation for the future development of a fully-fledged Information Retrieval tool for these (and potentially other) low-resource historical languages.

Description

Keywords

46 Information and Computing Sciences, 47 Language, Communication and Culture, 4704 Linguistics

Journal Title

Journal of Open Humanities Data

Conference Name

Journal ISSN

2059-481X
2059-481X

Volume Title

8

Publisher

Ubiquity Press, Ltd.