Improving FFTNet vocoder with noise shaping and subband approaches

T Okamoto, T Toda, Y Shiga… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
T Okamoto, T Toda, Y Shiga, H Kawai
2018 IEEE Spoken Language Technology Workshop (SLT), 2018ieeexplore.ieee.org
Although FFTNet neural vocoders can synthesize speech waveforms in real time, the
synthesized speech quality is worse than that of WaveNet vocoders. To improve the
synthesized speech quality of FFTNet while ensuring real-time synthesis, residual
connections are introduced to enhance the prediction accuracy. Additionally, time-invariant
noise shaping and subband approaches, which significantly improve the synthesized
speech quality of WaveNet vocoders, are applied. A subband FFTNet vocoder with …
Although FFTNet neural vocoders can synthesize speech waveforms in real time, the synthesized speech quality is worse than that of WaveNet vocoders. To improve the synthesized speech quality of FFTNet while ensuring real-time synthesis, residual connections are introduced to enhance the prediction accuracy. Additionally, time-invariant noise shaping and subband approaches, which significantly improve the synthesized speech quality of WaveNet vocoders, are applied. A subband FFTNet vocoder with multiband input is also proposed to directly compensate the phase shift between subbands. The proposed approaches are evaluated through experiments using a Japanese male corpus with a sampling frequency of 16 kHz. The results are compared with those synthesized by the STRAIGHT vocoder without mel-cepstral compression and those from conventional FFTNet and WaveNet vocoders. The proposed approaches are shown to successfully improve the synthesized speech quality of the FFTNet vocoder. In particular, the use of noise shaping enables FFTNet to significantly outperform the STRAIGHT vocoder.
ieeexplore.ieee.org
Showing the best result for this search. See all results