Google Scholar

Improving FFTNet vocoder with noise shaping and subband approaches

T Okamoto, T Toda, Y Shiga… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org

2018 IEEE Spoken Language Technology Workshop (SLT), 2018•ieeexplore.ieee.org

Although FFTNet neural vocoders can synthesize speech waveforms in real time, the synthesized speech quality is worse than that of WaveNet vocoders. To improve the synthesized speech quality of FFTNet while ensuring real-time synthesis, residual connections are introduced to enhance the prediction accuracy. Additionally, time-invariant noise shaping and subband approaches, which significantly improve the synthesized speech quality of WaveNet vocoders, are applied. A subband FFTNet vocoder with multiband input is also proposed to directly compensate the phase shift between subbands. The proposed approaches are evaluated through experiments using a Japanese male corpus with a sampling frequency of 16 kHz. The results are compared with those synthesized by the STRAIGHT vocoder without mel-cepstral compression and those from conventional FFTNet and WaveNet vocoders. The proposed approaches are shown to successfully improve the synthesized speech quality of the FFTNet vocoder. In particular, the use of noise shaping enables FFTNet to significantly outperform the STRAIGHT vocoder.

ieeexplore.ieee.org

Show moreShow less

Save Cite Cited by 21 Related articles All 3 versions

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Improving FFTNet vocoder with noise shaping and subband approaches