Translation quality estimation using only bilingual corpora

L Liu, A Fujita, M Utiyama, A Finch… - IEEE/ACM Transactions …, 2017 - ieeexplore.ieee.org
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017ieeexplore.ieee.org
In computer-aided translation scenarios, quality estimation of machine translation
hypotheses plays a critical role. Existing methods for word-level translation quality
estimation (TQE) rely on the availability of manually annotated TQE training data obtained
via direct annotation or postediting. However, due to the cost of human labor, such data are
either limited in size or is only available for few tasks in practice. To avoid the reliance on
such annotated TQE data, this paper proposes an approach to train word-level TQE models …
In computer-aided translation scenarios, quality estimation of machine translation hypotheses plays a critical role. Existing methods for word-level translation quality estimation (TQE) rely on the availability of manually annotated TQE training data obtained via direct annotation or postediting. However, due to the cost of human labor, such data are either limited in size or is only available for few tasks in practice. To avoid the reliance on such annotated TQE data, this paper proposes an approach to train word-level TQE models using bilingual corpora, which are typically used in machine translation training and is relatively easier to access. We formalize the training of our proposed method under the framework of maximum marginal likelihood estimation. To avoid degenerated solutions, we propose a novel regularized training objective whose optimization is achieved by an efficient approximation. Extensive experiments on both written and spoken language datasets empirically show that our approach yields comparable performance to the standard training on annotated data.
ieeexplore.ieee.org
Showing the best result for this search. See all results