[PDF][PDF] Pronunciation adaptive self speaking agent using wavegrad

T Tanaka, R Komatsu, T Okamoto… - Proc. 2nd Workshop …, 2022 - aaai-sas-2022.github.io
T Tanaka, R Komatsu, T Okamoto, T Shinozaki
Proc. 2nd Workshop Self-Supervised Learn. Audio Speech Process., 2022aaai-sas-2022.github.io
The ability to automatically learn to speak through observation and dialogue without relying
on labeled training data is essential for intelligent robots or agents to flexibly and
expressively talk to humans on an equal footing. Previous methods have demonstrated that
automatic spoken language acquisition becomes possible by combining unsupervised and
reinforcement learnings with end-to-end neural networks. However, such utterances were a
simple playback of segmented wave sounds, which lacked flexibility in pronunciation. This …
Abstract
The ability to automatically learn to speak through observation and dialogue without relying on labeled training data is essential for intelligent robots or agents to flexibly and expressively talk to humans on an equal footing. Previous methods have demonstrated that automatic spoken language acquisition becomes possible by combining unsupervised and reinforcement learnings with end-to-end neural networks. However, such utterances were a simple playback of segmented wave sounds, which lacked flexibility in pronunciation. This work introduces WaveGrad speech synthesizer as the agent’s speech organ by embedding its optimization in the self-supervised learning framework. Experimental results show that WaveGrad gives the same speaking performance as the conventional method in a steady environment and outperforms it when the background noise changes, proving its ability to adjust its pronunciation for smoother communication.
aaai-sas-2022.github.io
Showing the best result for this search. See all results