IWSLT 2009 Corpus Release (for IWSLT 2009 participants only)
User License Agreement | DOC, PDF |
Download |
CHALLENGE Task BTEC Task |
In order to get access to the corpus, please follow the procedure below. Access will be enabled AFTER we received your original signed user license agreement.
- download the post-workshop user license agreement (click on DOC/PDF link above), sign it, and send it two copies to:
Michael Paul
National Institute of Information and Communications Technology
Knowledge Creating Communciation Research Center
MASTAR Project
Language Translation Group
3-5 Hikaridai, "Keihanna Science City"
Kyoto 619-0289, Japan
- download the corpus files using the ID and Password you obtained for the download of the training data files for IWSLT 2009.
Corpus Data Files
- train:
- (BTEC) 20K sentence pairs of translation examples with case and punctuation information segmented according to utilized ASR engine
- (CHALLENGE) in addition to BTEC@train, 10K sentence pairs of translation examples with dialog annotations
- up to 6 evaluation data sets containing 500 source language sentences with multiple references and ASR output data files (= testsets of previous IWSLT evaluation campaigns)
- 500 source language sentences and ASR output data files (= input of run-submissions of this years evaluation campaign)
- preprocessing scripts (tokenization, NBEST extraction, etc.) used to prepare the data sets
CHALLENGE | |||
---|---|---|---|
Chinese-English | |||
train | dev | test | tools |
TGZ | TGZ | TGZ | TGZ |
English-Chinese | |||
train | dev | test | tools |
TGZ | TGZ | TGZ | TGZ |
BTEC | |||
---|---|---|---|
Arabic-to-English | |||
train | dev | test | tools |
TGZ | TGZ | TGZ | TGZ |
Chinese-to-English | |||
train | dev | test | tools |
TGZ | TGZ | TGZ | TGZ |
Turkish-to-English | |||
train | dev | test | tools |
TGZ | TGZ | TGZ | TGZ |
Templates for LaTeX/MSWord
- Gzipped TAR archive (all template files): latex_template_iwslt09.tgz
- LaTeX style: iwslt09.sty
- Example document: template.tex
- Example document PS: template.ps
- Example document PDF: template.pdf
- Bibliography style: IEEEtran.bst
- MS-Word template: template.doc