Repository logo
 

End-to-End Speech Recognition for Endangered Languages of Nepal

Published version
Peer-reviewed

Type

Conference Object

Change log

Authors

O’Neill, A 
Coto-Solano, R 

Abstract

This paper presents three experiments to test the most effective and efficient ASR pipeline to facilitate the documentation and preservation of endangered languages, which are often ex- tremely low-resourced. With data from two lan- guages in Nepal —Dzardzongke and Newar— we show that model improvements are different for different masses of data, and that transfer learning as well as a range of modifications (e.g. normalising amplitude and pitch) can be effective, but that a consistently-standardised orthography as NLP input and post-training dic- tionary corrections improve results even more.

Description

Keywords

Journal Title

ComputEL 2024 - 7th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop

Conference Name

Comput-EL workshop at the EACL

Journal ISSN

Volume Title

Publisher

Association for Computational Linguistics

Publisher DOI

Sponsorship
Endangered Languages Documentation Programme (ELDP) (SG0716)
AHRC (via SOAS University of London) (R420)