Google Scholar

[PDF][PDF] Pair-Wise Distance Metric Learning of Neural Network Model for Spoken Language Identification.

X Lu, P Shen, Y Tsao, H Kawai - INTERSPEECH, 2016 - academia.edu

X Lu, P Shen, Y Tsao, H Kawai

INTERSPEECH, 2016•academia.edu

Abstract

The i-vector representation and modeling technique has been successfully applied in spoken language identification (SLI). In modeling, a discriminative transform or classifier must be applied to emphasize variations correlated to language identity since the i-vector representation encodes most of the acoustic variations (eg, speaker variation, transmission channel variation, etc.). Due to the strong nonlinear discriminative power of neural network (NN) modeling (including its deep form DNN), the NN has been directly used to learn the mapping function between the i-vector representation and language identity labels. In most studies, only the point-wise feature-label information is feeded to NN for parameter learning which may result in model overfitting, particularly when with limited training data. In this study, we propose to integrate pair-wise distance metric learning in NN parameter optimization. In the representation space of nonlinear transforms of hidden layers, a distance metric learning is explicitly designed for minimizing the pair-wise intra-class variation and maximizing the inter-class variation. With the distance metric as a constraint in the point-wise learning, the i-vectors are transformed to a new feature space which are much more discriminative for samples belonging to different languages while are much more similar for samples belonging to the same language. We tested the algorithm on a SLI task, encouraging results were obtained with more than 20% relative improvement on identification error rate.

academia.edu

Show moreShow less

Save Cite Cited by 6 Related articles All 5 versions View as HTML

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

[PDF][PDF] Pair-Wise Distance Metric Learning of Neural Network Model for Spoken Language Identification.