Google Scholar

[PDF][PDF] Pronunciation adaptive self speaking agent using wavegrad

T Tanaka, R Komatsu, T Okamoto… - Proc. 2nd Workshop …, 2022 - aaai-sas-2022.github.io

T Tanaka, R Komatsu, T Okamoto, T Shinozaki

Proc. 2nd Workshop Self-Supervised Learn. Audio Speech Process., 2022•aaai-sas-2022.github.io

Abstract

The ability to automatically learn to speak through observation and dialogue without relying on labeled training data is essential for intelligent robots or agents to flexibly and expressively talk to humans on an equal footing. Previous methods have demonstrated that automatic spoken language acquisition becomes possible by combining unsupervised and reinforcement learnings with end-to-end neural networks. However, such utterances were a simple playback of segmented wave sounds, which lacked flexibility in pronunciation. This work introduces WaveGrad speech synthesizer as the agent’s speech organ by embedding its optimization in the self-supervised learning framework. Experimental results show that WaveGrad gives the same speaking performance as the conventional method in a steady environment and outperforms it when the background noise changes, proving its ability to adjust its pronunciation for smoother communication.

aaai-sas-2022.github.io

Show moreShow less

Save Cite Cited by 4 Related articles View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

[PDF][PDF] Pronunciation adaptive self speaking agent using wavegrad