Training data pseudo-shuffling and direct decoding framework for recurrent neural network based acoustic modeling

N Kanda, M Tachimori, X Lu… - 2015 IEEE Workshop on …, 2015 - ieeexplore.ieee.org
N Kanda, M Tachimori, X Lu, H Kawai
2015 IEEE Workshop on Automatic Speech Recognition and …, 2015ieeexplore.ieee.org
We propose two techniques to enhance the performance of recurrent neural network (RNN)-
based acoustic models. The first technique addresses training efficiency. Because RNNs
require sequential input, it is difficult to randomly shuffle training samples to accelerate
stochastic gradient descent based training. We propose a" pseudo-shuffling" procedure that
instead augments training sample unexpectedness by skipping successive samples. The
second proposed technique is a novel" direct decoding" framework in which the posterior …
We propose two techniques to enhance the performance of recurrent neural network (RNN)-based acoustic models. The first technique addresses training efficiency. Because RNNs require sequential input, it is difficult to randomly shuffle training samples to accelerate stochastic gradient descent based training. We propose a "pseudo-shuffling" procedure that instead augments training sample unexpectedness by skipping successive samples. The second proposed technique is a novel "direct decoding" framework in which the posterior probability of the RNN is inputted into a decoder without conversion into a hidden Markov model emission probability. In our large vocabulary speech recognition experiments with English lecture recordings, the first technique significantly improved RNN training efficiency, showing a 14.3% relative word error rate (WER) improvement. The second technique further achieved an additional 3.1% relative WER improvement. Our sigmoid-type RNN achieved a 10.7% better WER than same-sized deep neural networks without using long short-term memory cells.
ieeexplore.ieee.org
Showing the best result for this search. See all results