[PDF][PDF] Real-Time Neural Text-to-Speech with Sequence-to-Sequence Acoustic Model and WaveGlow or Single Gaussian WaveRNN Vocoders.

T Okamoto, T Toda, Y Shiga, H Kawai - INTERSPEECH, 2019 - isca-archive.org
T Okamoto, T Toda, Y Shiga, H Kawai
INTERSPEECH, 2019isca-archive.org
This paper investigates real-time high-fidelity neural text-tospeech (TTS) systems. For real-
time neural vocoders, Wave-Glow is introduced and single Gaussian (SG) WaveRNN is
proposed. The proposed SG-WaveRNN can predict continuous valued speech waveforms
with half the synthesis time compared with vanilla WaveRNN with dual-softmax for 16 bit
audio prediction. Additionally, a sequence-to-sequence (seq2seq) acoustic model (AM) for
pitch accent languages, such as Japanese, is investigated by introducing Tacotron 2 …
Abstract
This paper investigates real-time high-fidelity neural text-tospeech (TTS) systems. For real-time neural vocoders, Wave-Glow is introduced and single Gaussian (SG) WaveRNN is proposed. The proposed SG-WaveRNN can predict continuous valued speech waveforms with half the synthesis time compared with vanilla WaveRNN with dual-softmax for 16 bit audio prediction. Additionally, a sequence-to-sequence (seq2seq) acoustic model (AM) for pitch accent languages, such as Japanese, is investigated by introducing Tacotron 2 architecture. In the seq2seq AM, full-context labels extracted from a text analyzer are used as input and they are directly converted into mel-spectrograms. The results of subjective experiment using a Japanese female corpus indicate that the proposed SG-WaveRNN vocoder with noise shaping can synthesize highquality speech waveforms and real-time high-fidelity neural TTS systems can be realized with the seq2seq AM and Wave-Glow or SG-WaveRNN vocoders. Especially, the seq2seq AM and WaveGlow vocoder conditioned on mel-spectrograms with simple PyTorch implementations can be realized with real-time factors 0.06 and 0.10 for inference using a GPU.
isca-archive.org
Showing the best result for this search. See all results