Google Scholar

[PDF][PDF] Maximum a posteriori Based Decoding for CTC Acoustic Models.

N Kanda, X Lu, H Kawai - Interspeech, 2016 - isca-archive.org

Interspeech, 2016•isca-archive.org

Abstract

This paper presents a novel decoding framework for connectionist temporal classification (CTC)-based acoustic models (AM). Although CTC-based AM inherently has the property of a language model (LM) in itself, an external LM trained with a large text corpus is still essential to obtain the best results. In the previous literatures, a naive interpolation of the CTC-based AM score and the external LM score was used, although there is no theoretical justification for it. In this paper, we propose a theoretically more sound decoding framework derived from a maximization of the posterior probability of a word sequence given an observation. In our decoding framework, a subword LM (SLM) is newly introduced to coordinate the CTC-based AM score and the word-level LM score. In experiments with the Wall Street Journal (WSJ) corpus and Corpus of Spontaneous Japanese (CSJ), our proposed framework consistently achieved improvements of 7.4–15.3% over the conventional interpolation-based framework. In the CSJ experiment, given 586 hours of training data, the CTC-based AM finally achieved a 6.7% better word error rate than the baseline method with deep neural networks and hidden Markov models.

isca-archive.org

Show moreShow less

Save Cite Cited by 54 Related articles All 2 versions View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

[PDF][PDF] Maximum a posteriori Based Decoding for CTC Acoustic Models.