Bottleneck linear transformation network adaptation for speaker adaptive training-based hybrid DNN-HMM speech recognizer

T Ochiai, S Matsuda, H Watanabe, X Lu… - … , Speech and Signal …, 2016 - ieeexplore.ieee.org
T Ochiai, S Matsuda, H Watanabe, X Lu, H Kawai, S Katagiri
2016 IEEE International Conference on Acoustics, Speech and Signal …, 2016ieeexplore.ieee.org
Recently, a Hybrid DNN-HMM recognizer trained with the Speaker Adaptive Training (SAT)
concept was successfully modified to a more effective speaker-adaptation-oriented
recognizer whose DNN front-end adopted a Linear Transformation Network (LTN) Speaker
Dependent (SD) module. However, the size of SD modules is still large, which incurs high
storage costs and the risk of over-training. To alleviate this problem, we analyze the
characteristics of an LTN module by focusing on the relation between its size and its feature …
Recently, a Hybrid DNN-HMM recognizer trained with the Speaker Adaptive Training (SAT) concept was successfully modified to a more effective speaker-adaptation-oriented recognizer whose DNN front-end adopted a Linear Transformation Network (LTN) Speaker Dependent (SD) module. However, the size of SD modules is still large, which incurs high storage costs and the risk of over-training. To alleviate this problem, we analyze the characteristics of an LTN module by focusing on the relation between its size and its feature-representation capability. Moreover, we propose a new SAT-based scheme for reducing the LTN size using SVD-based matrix compression. Evaluation experiments on the TED Talks corpus prove that our LTN size-reduction scheme not only maintains the adaptation performance of the original LTN-embedded, SAT-based DNN-HMM recognizer but also further increases it especially in cases where the speech data available for adaptation training are severely limited.
ieeexplore.ieee.org
Showing the best result for this search. See all results