[PDF][PDF] Maximum a posteriori Based Decoding for CTC Acoustic Models.

N Kanda, X Lu, H Kawai - Interspeech, 2016 - isca-archive.org
N Kanda, X Lu, H Kawai
Interspeech, 2016isca-archive.org
This paper presents a novel decoding framework for connectionist temporal classification
(CTC)-based acoustic models (AM). Although CTC-based AM inherently has the property of
a language model (LM) in itself, an external LM trained with a large text corpus is still
essential to obtain the best results. In the previous literatures, a naive interpolation of the
CTC-based AM score and the external LM score was used, although there is no theoretical
justification for it. In this paper, we propose a theoretically more sound decoding framework …
Abstract
This paper presents a novel decoding framework for connectionist temporal classification (CTC)-based acoustic models (AM). Although CTC-based AM inherently has the property of a language model (LM) in itself, an external LM trained with a large text corpus is still essential to obtain the best results. In the previous literatures, a naive interpolation of the CTC-based AM score and the external LM score was used, although there is no theoretical justification for it. In this paper, we propose a theoretically more sound decoding framework derived from a maximization of the posterior probability of a word sequence given an observation. In our decoding framework, a subword LM (SLM) is newly introduced to coordinate the CTC-based AM score and the word-level LM score. In experiments with the Wall Street Journal (WSJ) corpus and Corpus of Spontaneous Japanese (CSJ), our proposed framework consistently achieved improvements of 7.4–15.3% over the conventional interpolation-based framework. In the CSJ experiment, given 586 hours of training data, the CTC-based AM finally achieved a 6.7% better word error rate than the baseline method with deep neural networks and hidden Markov models.
isca-archive.org
Showing the best result for this search. See all results