IWSLT: Theme

Home - Theme - Evaluation Campaign - Important Dates - Submission - Downloads - Program - Proceedings - Author Index - Photo Gallery - Venue Information - Organizers - Contact - References

Theme

Spoken language translation technologies attempt to cross the language barriers between people with different native languages who each want to engage in conversation by using their mother-tongue. The importance of these technologies is increasing because there are many more opportunities for cross-language communication in face-to-face and telephone conversation, especially in the domain of tourism.

Novel technologies have been proposed to tackle the problems in spoken language translation research. A number of institutes are developing huge bilingual or multilingual spoken language corpora. MT technologies based on machine learning, such as statistical MT and example-based MT, are being applied to the translation of spoken language by using these corpora. Some of the characteristics of spoken language seem suitable for the application of machine-learning-based MT in comparison with written language. However, there is still no concrete standard methodology for comparing the translation qualities of spoken language translation systems.

One of the prominent research activities in spoken language translation is the work being conducted by the Consortium for Speech Translation Advanced Research (C-STAR III), which is an international partnership of research laboratories engaged in automatic translation of spoken language. Current members include ATR (Japan), CAS (China), CLIPS (France), CMU (USA), ETRI (Korea), ITC-irst (Italy), and UKA (Germany). One of C-STAR's ongoing projects is the joint development of a speech corpus that handles a common task in multiple languages. The creation of such a corpus will not only enable translation among multiple languages but will also facilitate exchange and discussion of research results among member labs. As a first result of this activity, a Japanese-English speech corpus comprising tourism-related sentences, originally compiled by ATR, has been translated into the native language of C-STAR members.

In this workshop, an "evaluation campaign" of spoken language translation technologies will be held by using the multilingual speech corpus containing the tourism-related sentences developed by ATR and C-STAR members. Two types of submissions are invited: 1) participants in the evaluation campaign of spoken language translation technologies; and 2) technical papers on related issues. An overview of the evaluation campaign is as follows:

Main Theme: Evaluation of spoken language translation systems

Corpus used for the evaluation campaign:

Basic Travel Expression Corpus (BTEC)

Languages: Chinese-English, Japanese-English

Domain: tourism-related sentences

Media: text in utterance style

Number of Sentence Pairs: 20,000 for each translation direction

Tracks of the Evaluation Campaign:

Translation Directions:

Chinese to English
Japanese to English

Resources Used:

Supplied corpus only (C-to-E, J-to-E)
Supplied corpus + additional linguistic resources available from LDC (C-to-E)
Unrestricted (C-to-E, J-to-E)

Evaluation Methodology of Translated Results

Subjective Evaluation

Automatic Evaluation (BLEU, NIST, WER, etc.)

The workshop also invites technical papers related to spoken language translation. Possible topics for the session include, but are not limited to:

MT Evaluation Measures

MT Algorithms

Word / Phrase Alignment

Multilingual Lexicon / Translation Rule Extraction

Multilingual Parsing