For developing state-of-the-art machine translation systems, IWSLT will provide links to relevant open software tools as well as a limited amount (20k sentence pairs for each translation direction) of linguistic resources. In addition, a list of links to permitted public resources is available on the resources page. The participants of IWSLT 2008 are encouraged to look for additional public resources and to share their availability by sending the respective information to the evaluation campaign chair. All declared resources will be permitted to be used for the training of the systems. Each participant in the evaluation campaign is requested to submit a paper describing the MT system, the utilized resources, and results using the provided test data. Contrastive run submissions using only the bilingual resources provided by IWSLT as well as investigations in the contribution of each utilized resource are highly appreciated. Moreover, all participants are requested to present their papers at the workshop.
The focus of this year's evaluation campaign will be the translation of spontaneous speech recorded in a real situation. Foreign travelers were provided with a state-of-the-art speech-to-speech translation hand-held device and were asked to carry-out specific tourism-related tasks (e.g., buying entrance tickets) using the device to communicate with local staff. Speech data was collected for 50 English and 50 Chinese travelers at 5 different locations, each carrying out 3-4 tasks. For the Challenge Task, IWSLT participants will have to translate the Chinese/English output of the automatic speech recognizers (lattice, N/1BEST) into English/Chinese, respectively.
Another innovative aspect of this year's edition will concern the feasability of pivot-language-based approaches. In the Pivot Task, participants will be provided with read-speech recordings (lattice, N/1BEST) of Chinese utterances from the travel domain and have to apply Chinese-English and English-Spanish systems to produce the Spanish output. The results will be compared to the direct translation between Chinese-Spanish.
Like in previous IWSLT events, a standard BTEC task, i.e. the translation of read-speech recordings (lattice, N/1BEST) and correct recognition results (text) of frequently used utterances in the travel domain, will be provided for Arabic-English and Chinese-English.
In addition to the evaluation campaign, the IWSLT 2008 workshop also invites scientific paper submissions related to spoken language technologies. Possible topics include, but are not limited to:
- Text and speech translation systems
- Search algorithms for MT
- Phrase alignment methods for MT
- Re-ordering models for MT
- Semantic models for MT
- Syntax-based MT
- Pivot-language-based MT
- MT evaluation
- Integration of ASR and MT
- Open source software for MT
- Language resources for MT
- Task adaptation and portability in MT