An open-source end-to-end ASR system for Brazilian Portuguese using DNNs built from newly assembled corpora

  • Igor Macedo Quintanilha Universidade Federal do Rio de Janeiro
  • Sergio Lima Netto Federal University of Rio de Janeiro
  • Luiz Wagner Pereira Biscainho

Abstract

In this work, we present a baseline end-to-end system based on deep learning for automatic speech recognition in Brazilian Portuguese. To build such a model, we employ a speech corpus containing 158 hours of annotated speech by assembling four individual datasets, three of them publicly available, and a text corpus containing 10.2 millions of sentences. We train an acoustic model based on the DeepSpeech 2 network, with two convolutional and five bidirectional recurrent layers. By adding a newly trained 15-gram language model at the character level, we achieve a character error rate of only 10.49% and a word error rate of 25.45%, which are on a par with other works in different languages using a similar amount of training data.

Published
01-09-2020
How to Cite
Macedo Quintanilha, I., Lima Netto, S., & Pereira Biscainho, L. (2020). An open-source end-to-end ASR system for Brazilian Portuguese using DNNs built from newly assembled corpora. Journal of Communication and Information Systems, 35(1), 230-242. https://doi.org/10.14209/jcis.2020.25
Section
Regular Papers