An HMM-Based Brazilian Portuguese Speech Synthesizer and Its Characteristics
DOI:
https://doi.org/10.14209/jcis.2006.11Abstract
Research on speech synthesis area has made great progress recently, perhaps motivated by its numerous applications, of which text-to-speech converters and dialog systems are examples. Several improvements have been reported in the technical literature related to existing state-of-the-art techniques as well as in the development of new ideas related to the alteration of voice characteristics, with their eventual application to different languages. Nevertheless, in spite of the attention that the speech synthesis field has been receiving, the technique which employs unit selection and concatenation of waveform segments still remains as the most popular approach among those available nowadays. In this paper, we report how a synthesizer for the Brazilian Portuguese language was constructed according to a technique in which the speech waveform is generated through parameters directly determined from Hidden Markov Models. When compared with systems based on unit selection and concatenation, the proposed synthesizer presents the advantage of being trainable, with the utilization of contextual factors including information related to different levels of the following acoustic units: phones, syllables, words, phrases and utterances. Such information is brought into effect through a set of questions for context-clustering. Thus, both the spectral and the prosodic characteristics of the system are managed by decision-trees generated for each one of the following parameters: mel-cepstral coefficients, fundamental frequency and state durations. As a typical characteristic of the technique based on Hidden Markov Models, synthesized speech with quality comparable to commercial applications built under the unit selection and concatenation approach can be obtained even from a database as small as eighteen minutes of speech. This was tested by a subjective comparison of samples from the synthesizer in question and other systems currently available for Brazilian Portuguese.Downloads
Download data is not yet available.
Downloads
Published
2015-06-18
How to Cite
Maia, R., Zen, H., Tokuda, K., Kitamura, T., & G. V. Resende Jr., F. (2015). An HMM-Based Brazilian Portuguese Speech Synthesizer and Its Characteristics. Journal of Communication and Information Systems, 21(2). https://doi.org/10.14209/jcis.2006.11
Issue
Section
Regular Papers
License
Authors who publish in this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a CC BY-NC 4.0 (Attribution-NonCommercial 4.0 International) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors can enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) before and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
___________
Received 2015-06-18
Accepted 2015-06-18
Published 2015-06-18
Accepted 2015-06-18
Published 2015-06-18