Home - Theme - Evaluation Campaign - Important Dates - Downloads - Submission - Run Submission Guidelines - Registration - Accommodations - Program - Keynote Speeches - Proceedings - Author Index - Bibliography - Venue - Gallery - Organizers - Contact - References

Downloads


IWSLT 2009 Corpus Release (for IWSLT 2009 participants only)

User License Agreement DOC, PDF
Download
  CHALLENGE Task
  • Chinese ↔ English
  BTEC Task
  • Arabic → English
  • Chinese → English
  • Turkish → English

In order to get access to the corpus, please follow the procedure below. Access will be enabled AFTER we received your original signed user license agreement.

  1. download the post-workshop user license agreement (click on DOC/PDF link above), sign it, and send it two copies to:

    Michael Paul
    National Institute of Information and Communications Technology
    Knowledge Creating Communciation Research Center
    MASTAR Project
    Language Translation Group
    3-5 Hikaridai, "Keihanna Science City"
    Kyoto 619-0289, Japan

  2. download the corpus files using the ID and Password you obtained for the download of the training data files for IWSLT 2009.


Corpus Data Files

- train:
  • (BTEC) 20K sentence pairs of translation examples with case and punctuation information segmented according to utilized ASR engine
  • (CHALLENGE) in addition to BTEC@train, 10K sentence pairs of translation examples with dialog annotations
- dev:
  • up to 6 evaluation data sets containing 500 source language sentences with multiple references and ASR output data files (= testsets of previous IWSLT evaluation campaigns)
- test:
  • 500 source language sentences and ASR output data files (= input of run-submissions of this years evaluation campaign)
- tools:
  • preprocessing scripts (tokenization, NBEST extraction, etc.) used to prepare the data sets

For data set details, click on the translation direction name tag.


CHALLENGE
Chinese-English
train dev test tools
TGZ TGZ TGZ TGZ
English-Chinese
train dev test tools
TGZ TGZ TGZ TGZ

BTEC
Arabic-to-English
train dev test tools
TGZ TGZ TGZ TGZ
Chinese-to-English
train dev test tools
TGZ TGZ TGZ TGZ
Turkish-to-English
train dev test tools
TGZ TGZ TGZ TGZ



Templates for LaTeX/MSWord

- Gzipped TAR archive (all template files): latex_template_iwslt09.tgz

- LaTeX style: iwslt09.sty
- Example document: template.tex
- Example document PS: template.ps
- Example document PDF: template.pdf
- Bibliography style: IEEEtran.bst
- MS-Word template: template.doc