Automatic speech recognition system for Karelian
Keywords:
low-resource languages, automatic transcription, Karelian language, Time Delay Neural NetworkAbstract
Introduction: There has been a growth in the number of studies devoted to automatic processing of low-resource languages. The lack of training data is a significant obstacle to the development of speech technologies for such languages. Purpose: To develop an automatic speech recognition system for Karelian. Results: we present a system for automatic speech recognition in Karelian. We have trained acoustic models based on artificial neural networks with time delays and hidden Markov models. We have trained the system with the use of a speech corpus composed of radio broadcast recordings and audio data modified with augmentation techniques. Both written texts and transcripts of a training part of the speech corpus have been involved. We have explored various coefficients to interpolate a language model trained on transcripts with a language model trained on written texts. The best value of the word error rate was 25.81%, which is comparable with the results for other low-resource languages. We have collected a training data set, which includes sound recordings of the Karelian language with transcripts, as well as a text corpus. Practical relevance: The results can be of a certain significance for the development of automatic recognition systems not only for Karelian but for other low-resource languages as well. In addition, the developed system may be useful for the researchers of the Karelian language, providing them with an effective tool for recording and processing the Karelian language data.