An HMM-Based Brazilian Portuguese Speech Synthesizer and Its Characteristics

  • R. Maia
  • H. Zen
  • K. Tokuda
  • T. Kitamura
  • F. G. V. Resende Jr.

Abstract

Research on speech synthesis area has made great progress recently, perhaps motivated by its numerous applications, of which text-to-speech converters and dialog systems are examples. Several improvements have been reported in the technical literature related to existing state-of-the-art techniques as well as in the development of new ideas related to the alteration of voice characteristics, with their eventual application to different languages. Nevertheless, in spite of the attention that the speech synthesis field has been receiving, the technique which employs unit selection and concatenation of waveform segments still remains as the most popular approach among those available nowadays. In this paper, we report how a synthesizer for the Brazilian Portuguese language was constructed according to a technique in which the speech waveform is generated through parameters directly determined from Hidden Markov Models. When compared with systems based on unit selection and concatenation, the proposed synthesizer presents the advantage of being trainable, with the utilization of contextual factors including information related to different levels of the following acoustic units: phones, syllables, words, phrases and utterances. Such information is brought into effect through a set of questions for context-clustering. Thus, both the spectral and the prosodic characteristics of the system are managed by decision-trees generated for each one of the following parameters: mel-cepstral coefficients, fundamental frequency and state durations. As a typical characteristic of the technique based on Hidden Markov Models, synthesized speech with quality comparable to commercial applications built under the unit selection and concatenation approach can be obtained even from a database as small as eighteen minutes of speech. This was tested by a subjective comparison of samples from the synthesizer in question and other systems currently available for Brazilian Portuguese.
Published
18-06-2015
How to Cite
Maia, R., Zen, H., Tokuda, K., Kitamura, T., & G. V. Resende Jr., F. (2015). An HMM-Based Brazilian Portuguese Speech Synthesizer and Its Characteristics. Journal of Communication and Information Systems, 21(2). https://doi.org/10.14209/jcis.2006.11
Section
Regular Papers