End-to-End Speech Recognition for Endangered Languages of Nepal
Published version
Peer-reviewed
Repository URI
Repository DOI
Change log
Authors
Abstract
This paper presents three experiments to test the most effective and efficient ASR pipeline to facilitate the documentation and preservation of endangered languages, which are often ex- tremely low-resourced. With data from two lan- guages in Nepal —Dzardzongke and Newar— we show that model improvements are different for different masses of data, and that transfer learning as well as a range of modifications (e.g. normalising amplitude and pitch) can be effective, but that a consistently-standardised orthography as NLP input and post-training dic- tionary corrections improve results even more.
Description
Keywords
Journal Title
ComputEL 2024 - 7th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop
Conference Name
Comput-EL workshop at the EACL
Journal ISSN
Volume Title
Publisher
Association for Computational Linguistics
Publisher DOI
Publisher URL
Sponsorship
Endangered Languages Documentation Programme (ELDP) (SG0716)
AHRC (via SOAS University of London) (R420)
AHRC (via SOAS University of London) (R420)