[PDF][PDF] Investigating Radical-Based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese.

S Li, X Lu, C Ding, P Shen, T Kawahara, H Kawai - INTERSPEECH, 2019 - isca-archive.org
S Li, X Lu, C Ding, P Shen, T Kawahara, H Kawai
INTERSPEECH, 2019isca-archive.org
Training automatic speech recognition (ASR) systems for East Asian languages (eg,
Chinese and Japanese) is tough work because of the characters existing in the writing
systems of these languages. Traditionally, we first need to get the pronunciation of these
characters by morphological analysis. The end-to-end (E2E) model allows for directly using
characters or words as the modeling unit. However, since different groups of people (eg,
residents in Chinese mainland, Hong Kong, Taiwan, and Japan) adopts different writing …
Abstract
Training automatic speech recognition (ASR) systems for East Asian languages (eg, Chinese and Japanese) is tough work because of the characters existing in the writing systems of these languages. Traditionally, we first need to get the pronunciation of these characters by morphological analysis. The end-to-end (E2E) model allows for directly using characters or words as the modeling unit. However, since different groups of people (eg, residents in Chinese mainland, Hong Kong, Taiwan, and Japan) adopts different writing forms for a character, this also leads to a large increase in the number of vocabulary, especially when building ASR systems across languages or dialects. In this paper, we propose a new E2E ASR modeling method by decomposing the characters into a set of radicals. Our experiments demonstrate that it is possible to effectively reduce the vocabulary size by sharing the basic radicals across different dialect of Chinese. Moreover, we also demonstrate this method could also be used to construct a Japanese E2E ASR system. The system modeled with radicals and kana achieved similar performance compared to state-of-the-art E2E system built with word-piece units.
isca-archive.org
Showing the best result for this search. See all results