Google Scholar

Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling

N Kanda, M Tachimori, X Lu… - 2015 IEEE Workshop on …, 2015 - ieeexplore.ieee.org

N Kanda, M Tachimori, X Lu, H Kawai

2015 IEEE Workshop on Automatic Speech Recognition and …, 2015•ieeexplore.ieee.org

We propose two techniques to enhance the performance of recurrent neural network (RNN)-based acoustic models. The first technique addresses training efficiency. Because RNNs require sequential input, it is difficult to randomly shuffle training samples to accelerate stochastic gradient descent based training. We propose a "pseudo-shuffling" procedure that instead augments training sample unexpectedness by skipping successive samples. The second proposed technique is a novel "direct decoding" framework in which the posterior probability of the RNN is inputted into a decoder without conversion into a hidden Markov model emission probability. In our large vocabulary speech recognition experiments with English lecture recordings, the first technique significantly improved RNN training efficiency, showing a 14.3% relative word error rate (WER) improvement. The second technique further achieved an additional 3.1% relative WER improvement. Our sigmoid-type RNN achieved a 10.7% better WER than same-sized deep neural networks without using long short-term memory cells.

ieeexplore.ieee.org

Show moreShow less

Save Cite Cited by 9 Related articles All 2 versions

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling