An open-source end-to-end ASR system for Brazilian Portuguese using DNNs built from newly assembled corpora

Authors

  • Igor Macedo Quintanilha Universidade Federal do Rio de Janeiro
  • Sergio Lima Netto Federal University of Rio de Janeiro
  • Luiz Wagner Pereira Biscainho

DOI:

https://doi.org/10.14209/jcis.2020.25

Abstract

In this work, we present a baseline end-to-end system based on deep learning for automatic speech recognition in Brazilian Portuguese. To build such a model, we employ a speech corpus containing 158 hours of annotated speech by assembling four individual datasets, three of them publicly available, and a text corpus containing 10.2 millions of sentences. We train an acoustic model based on the DeepSpeech 2 network, with two convolutional and five bidirectional recurrent layers. By adding a newly trained 15-gram language model at the character level, we achieve a character error rate of only 10.49% and a word error rate of 25.45%, which are on a par with other works in different languages using a similar amount of training data.

Downloads

Download data is not yet available.

Downloads

Published

2020-09-01

How to Cite

Macedo Quintanilha, I., Lima Netto, S., & Pereira Biscainho, L. W. (2020). An open-source end-to-end ASR system for Brazilian Portuguese using DNNs built from newly assembled corpora. Journal of Communication and Information Systems, 35(1), 230–242. https://doi.org/10.14209/jcis.2020.25

Issue

Section

Regular Papers
Received 2020-03-23
Accepted 2020-07-28
Published 2020-09-01

Most read articles by the same author(s)