Corpus Specifications
Training Corpus:
- data format:
- each line consists of three fields divided by the character '\'
- sentence consisting of words divided by single spaces
- Field_1: sentence ID
- Field_2: paraphrase ID
- Field_3: MT training sentence
format: <SENTENCE_ID>\01\<MT_TRAINING_SENTENCE>
- example:
- TRAIN_00001\01\This is the first training sentence.
- TRAIN_00002\01\This is the second training sentence.
- Arabic-English (AE)
- Chinese-English (CE)
- Chinese-Spanish (CS)
- Chinese-(English)-Spanish (CES)
- English-Chinese (EC)
- 20K sentences randomly selected from the BTEC corpus
- coding: UTF-8
- word segmentations according to ASR output segmentation
- text is case-sensitive and includes punctuations
Develop Corpus:
- ASR output (lattice, NBEST, 1BEST), correct recognition result transcripts (text), reference translations of previous IWSLT test data sets
- data format:
- 1-BEST
- each line consists of three fields divided by the character '\'
- sentence consisting of words divided by single spaces
- Field_1: sentence ID
- Field_2: paraphrase ID
- Field_3: best recognition hypothesis
- example (input):
- N-BEST
- each line consists of three fields divided by the character '\'
- sentence consisting of words divided by single spaces
- Field_1: sentence ID
- Field_2: NBEST ID (max: 20)
- Field_3: recognition hypothesis
- example:
- word lattices → HTK Standard Lattice Format (SLF)
- reference translations
- each line consists of three fields divided by the character '\'
- sentence consisting of words divided by single spaces
- Field_1: sentence ID
- Field_2: paraphrase ID
- Field_3: reference translation
- example:
- Arabic-English
- CSTAR03 testset: 506 sentences, 16 reference translations
- IWSLT04 testset: 500 sentences, 16 reference translations
- IWSLT05 testset: 506 sentences, 16 reference translations
- IWSLT06 devset: 489 sentences, 7 reference translations
- IWSLT06 testset: 500 sentences, 7 reference translations
- IWSLT07 testset: 489 sentences, 6 reference translations
- Chinese-English
- CSTAR03 testset: 506 sentences, 16 reference translations
- IWSLT04 testset: 500 sentences, 16 reference translations
- IWSLT05 testset: 506 sentences, 16 reference translations
- IWSLT06 devset: 489 sentences, 7 reference translations
- IWSLT06 testset: 500 sentences, 7 reference translations
- IWSLT07 testset: 489 sentences, 6 reference translations
- IWSLT08 devset: 250 sentences of the field-experiment data (challenge task)
- Chinese-Spanish
- Chinese-(English)-Spanish
- IWSLT05 testset: 506 sentences, 16 reference translations
- English-Chinese
- IWSLT05 testset: 506 sentences, 16 reference translations
- IWSLT08 devset: 250 sentences of the field-experiment data (challenge task)
format: <SENTENCE_ID>\01\<MT_INPUT_SENTENCE>
DEV_001\01\best ASR hypothesis for 1st input
DEV_002\01\best ASR hypothesis for 2nd input
...
DEV_002\01\best ASR hypothesis for 2nd input
...
format: <SENTENCE_ID>\<NBEST_ID>\<MT_INPUT_SENTENCE>
DEV_001\01\best ASR hypothesis for 1st input
DEV_001\02\2nd-best ASR hypothesis for the 1st input
...
DEV_001\20\20th-best ASR hypothesis for the 1st input
DEV_002\01\best ASR hypothesis for the 2nd input
...
DEV_001\02\2nd-best ASR hypothesis for the 1st input
...
DEV_001\20\20th-best ASR hypothesis for the 1st input
DEV_002\01\best ASR hypothesis for the 2nd input
...
format: <SENTENCE_ID>\PARAPHRASE_ID\<REFERENCE>
DEV_001\01\1st reference translation for 1st input
DEV_001\02\2nd reference translation for 1st input
...
DEV_002\01\1st reference translation for 2nd input
DEV_002\02\2nd reference translation for 2nd input
...
DEV_001\02\2nd reference translation for 1st input
...
DEV_002\01\1st reference translation for 2nd input
DEV_002\02\2nd reference translation for 2nd input
...
Test Corpus:
- Challenge Task
- Chinese-English
- English-Chinese
- 500 sentences of the field-experiment data
- coding: → see Develop Corpus
- data format: → see Develop Corpus
- BTEC Task
- Arabic-English
- Chinese-English
- Chinese-Spanish
- 500 unseen sentences of the BTEC evaluation corpus
- coding: → see Develop Corpus
- data format: → see Develop Corpus
- PIVOT Task
- Chinese-(English)-Spanish
- 500 unseen sentences of the BTEC evaluation corpus
- coding: → see Develop Corpus
- data format: → see Develop Corpus
Translation Input Conditions
Spontaneous Speech
- Challenge Task
- Chinese-English
- English-Chinese
Read Speech
- BTEC Task
- Arabic-English
- Chinese-English
- Chinese-Spanish
- PIVOT Task
- Chinese-(English)-Spanish
Correct Recognition Results
- Challenge Task
- Chinese-English
- English-Chinese
- BTEC Task
- Arabic-English
- Chinese-English
- Chinese-Spanish
- PIVOT Task
- Chinese-(English)-Spanish
Evaluation
Subjective Evaluation:
- Metrics:
- ranking
→ all primary run submissions
- fluency/adequacy
→ top-scoring (according to average of BLEU and METEOR scores) primary run submission + up to 3 additional primary runs selected by organizers (according to level of innovation of translation approach)
- ranking
- Evaluators:
- 3 graders per translation
Automatic Evaluation:
- Metrics:
- BLEU
- METEOR
→ all run submissions - Evaluation Specifications:
- Official:
- case sensitive
- with punctuation marks tokenized
- Additional:
- case insensitive (lower-case only)
- no punctuation marks
- Official:
- Data Processing Prior to Evaluation:
- English MT Output:
- simple tokenization of punctuations (see 'tools/ppEnglish.case+punc.pl' script)
- Spanish MT Output:
- simple tokenization of punctuations (see 'tools/ppSpanish.case+punc.pl' script)
- Chinese MT Output:
- segmentation into characters (see 'tools/splitUTF8Characters' script)
- English MT Output: