2009年12月アーカイブ

Theme

| トラックバック(0)
The International Workshop on Spoken Language Translation (IWSLT) is a yearly, open evaluation campaign for spoken language translation followed by a scientific workshop, in which both system descriptions and scientific papers are presented. IWSLT's evaluations are not competition-oriented, but their goal is to foster cooperative work and scientific exchange. In this respect, IWSLT proposes challenging research tasks and an open experimental infrastructure for the scientific community working on spoken and written language translation.

Evaluation Campaign

The 6th International Workshop on Spoken Language Translation will take place in Tokyo, Japan in December 2009. The focus of this year's evaluation campaign will be the translation of task-oriented human dialogs in travel situations. The speech data was recorded through human interpreters, where native speakers of different languages were asked to complete certain travel-related tasks like hotel reservations using their mother-tongue. The translation of the freely-uttered conversation was carried-out by human interpreters. The obtained speech data was annotated with dialog and speaker information. For the Challenge Task, IWSLT participants will have to translate both, the Chinese and English, outputs of the automatic speech recognizers (lattice, N/1BEST) into English and Chinese, respectively.

Like in previous IWSLT events, a standard BTEC Task will be provided. However, the BTEC Task will focus on text input only, i.e. no automatic speech recognizer results (lattice, N/1BEST) have to be translated. In addition to the Arabic-English and Chinese-English translation tasks, this year's evaluation campaign features Turkish as a new input language.

Each participant in the evaluation campaign is requested to submit a paper describing the MT system, the utilized resources, and results using the provided test data. Contrastive run submissions using only the bilingual resources provided by IWSLT as well as investigations in the contribution of each utilized resource are highly appreciated. Moreover, all participants are requested to present their papers at the workshop.

Scientific Paper

In addition to the evaluation campaign, the IWSLT 2009 workshop also invites scientific paper submissions related to spoken language technologies. Possible topics include, but are not limited to:

  • Spoken dialog modeling
  • Integration of ASR and MT
  • SMT, EBMT, RBMT, Hybrid MT
  • MT evaluation
  • Language resources for MT
  • Open source software for MT
  • Pivot-language-based MT
  • Task adaptation and portability in MT

Evaluation Campaign

| トラックバック(0)
The evaluation campaign is carried out using BTEC (Basic Travel Expression Corpus), a multilingual speech corpus containing tourism-related sentences similar to those that are usually found in phrasebooks for tourists going abroad. In addition, parts of the SLDB (Spoken Language Databases) corpus, a collection of human-mediated cross-lingual dialogs in travel situations, are provided to the participants of the Challenge Task. Details about the supplied corpora, the data set conditions for each track, the guidelines on how to submit one's translation results, and the evaluation specifications used in this workshop are given below.

Please note that compared to previous IWSLT evaluation campaigns, the guidelines for how to use the language resources for each data track have changed for IWSLT 2009. Starting in 2007, we encouraged everyone to collect out-of-domain language resources and tools that could be shared between the participants. This was very helpful for many participants and allowed many interesting experiments, but had the side-effect of the system outputs being difficult to compare because it was impossible to find out whether certain gains in performance were triggered by better suited (or simply more) language resources (engineering aspects) or by improvements in the underlying decoding algorithms and statistical models (research aspects). After the IWSLT 2008 workshop, many participants asked us to focus on the research aspects for IWSLT 2009.

Therefore, the monolingual and bilingual language resources that should be used to train the translation engines for the primay runs are limited to the supplied corpus for each translation task. This includes all supplied development sets, i.e., you are free to use these data sets as you wish for tuning of model parameters or as training bitext, etc. All other languages resources besides the ones for the given translation task, should be treated as "additional language resources". For examples, any additional dictionaries, word lists, bitext corpora such as the ones provided by LDC. In addition, some participants asked whether they could use the BTEC TE and BTEC AE supplied resources for the BTEC CE task. These should also be treated as "additional resources". Because it is impossible to limit the usage of linguistic tools like word segmentation tools, parsers, etc., those tools are allowed to preprocess the supplied corpus, but we kindly ask participants to describe in detail which tools were applied for data preprocessing in their system description paper.

In order to motivate participants to continue to explore the effects of additional language resources (model adaptation, OOV handling, etc.) we DO ACCEPT contrastive runs based on additional resources. These will be evaluated automatically using the same framework as the primary runs, thus the results will be directly comparable to this year's primary runs and can be published by the participants in the MT system description paper or in a scientific paper. Due to the workshop budget limits, however, it would be difficult to include all contrastive runs into the subjective evaluation. Therefore, we kindly ask the partipants for a contribution if they would like to obtain a human assessment of their contrastive runs as well. If you intend to do so, please contact us as soon as possible, so that we can adjust the evaluation schedule accordingly. Contrastive run results will not appear in the overview paper, but participants are free to report their findings in the MT system description paper or even a separate scientific paper submission.

[Corpus Specifications]

[Translation Input Conditions]

[Evaluation Specifications]



Corpus Specifications

BTEC Training Corpus:
  • data format:
    • each line consists of three fields divided by the character '\'
    • sentence consisting of words divided by single spaces
    • format: <SENTENCE_ID>\01\<MT_TRAINING_SENTENCE>
    • Field_1: sentence ID
    • Field_2: paraphrase ID
    • Field_3: MT training sentence
  • example:
    • TRAIN_00001\01\This is the first training sentence.
    • TRAIN_00002\01\This is the second training sentence.
  • Arabic-English (AE)
  • Chinese-English (CE)
  • Turkish-English (TE)

    • 20K sentences randomly selected from the BTEC corpus
    • coding: UTF-8
    • text is case-sensitive and includes punctuations

BTEC Develop Corpus:

  • text input, reference translations of BTEC sentences
  • data format:
    • each line consists of three fields divided by the character '\'
    • sentence consisting of words divided by single spaces
    • format: <SENTENCE_ID>\<PARAPHRASE_ID>\<TEXT>
    • Field_1: sentence ID
    • Field_2: paraphrase ID
    • Field_3: MT develop sentence / reference translation
  • text input example:
    • DEV_001\01\This is the first develop sentence.
    • DEV_002\01\This is the second develop sentence.
  • reference translation example:
  • DEV_001\01\1st reference translation for 1st input
    DEV_001\02\2nd reference translation for 1st input
    ...
    DEV_002\01\1st reference translation for 2nd input
    DEV_002\02\2nd reference translation for 2nd input
    ...
  • Arabic-English
    • CSTAR03 testset: 506 sentences, 16 reference translations
    • IWSLT04 testset: 500 sentences, 16 reference translations
    • IWSLT05 testset: 506 sentences, 16 reference translations
    • IWSLT07 testset: 489 sentences, 6 reference translations
    • IWSLT08 testset: 507 sentences, 16 reference translations

  • Chinese-English
    • CSTAR03 testset: 506 sentences, 16 reference translations
    • IWSLT04 testset: 500 sentences, 16 reference translations
    • IWSLT05 testset: 506 sentences, 16 reference translations
    • IWSLT07 testset: 489 sentences, 6 reference translations
    • IWSLT08 testset: 507 sentences, 16 reference translations

  • Turkish-English
    • CSTAR03 testset: 506 sentences, 16 reference translations
    • IWSLT04 testset: 500 sentences, 16 reference translations

BTEC Test Corpus:

  • Arabic-English
  • Chinese-English
  • Turkish-English
    • 470 unseen sentences of the BTEC evaluation corpus
    • coding: → see BTEC Develop Corpus
    • data format: → see BTEC Develop Corpus


CHALLENGE Training Corpus:
  • TXT data format:
    • each line consists of three fields divided by the character '\'
    • sentence consisting of words divided by single spaces
    • format: <SENTENCE_ID>\01\<MT_TRAINING_SENTENCE>
    • Field_1: dialog ID
    • Field_2: sentence ID
    • Field_3: MT training sentence
    • example:
    • TRAIN_00001\This is the first training sentence.
    • TRAIN_00002\This is the second training sentence.
    • ...
  • INFO data format:
    • each line consists of three fields divided by the character '\'
    • sentence consisting of words divided by single spaces
    • format: <SENTENCE_ID>\01\<SPEAKER_TAG>
    • Field_1: dialog ID
    • Field_2: sentence ID
    • Field_3: speaker annotations ('a': agent, 'c': customer, 'i': interpreter)
    • example:
    • train_dialog01\01\a
    • train_dialog01\02\i
    • train_dialog01\03\a
    • ...
    • train_dialog398\20\i
    • train_dialog398\21\i
    • train_dialog398\22\c
  • Chinese-English (CE)
  • English-Chinese (EC)

    • 394 dialogs, 10K sentences from the SLDB corpus
    • coding: UTF-8
    • word segmentations according to ASR output segmentation
    • text is case-sensitive and includes punctuations

CHALLENGE Develop Corpus:

  • ASR output (lattice, NBEST, 1BEST), correct recognition result transcripts (text), reference translations of SLDB dialogs
  • data format:
    • 1-BEST
      • each line consists of three fields divided by the character '\'
      • sentence consisting of words divided by single spaces
      • format: <SENTENCE_ID>\01\<RECOGNITION_HYPOTHESIS>
      • Field_1: sentence ID
      • Field_2: paraphrase ID
      • Field_3: best recognition hypothesis
      • example (input):
      • IWSLT09_CT.devset_dialog01_02\01\best ASR hypothesis for 1st utterance
        IWSLT09_CT.devset_dialog01_04\01\best ASR hypothesis for 2nd utterance
        IWSLT09_CT.devset_dialog01_06\01\best ASR hypothesis for 3rd utterance
        ...
    • N-BEST
      • each line consists of three fields divided by the character '\'
      • sentence consisting of words divided by single spaces
      • format: <SENTENCE_ID>\01\<RECOGNITION_HYPOTHESIS>
      • Field_1: sentence ID
      • Field_2: NBEST ID (max: 20)
      • Field_3: recognition hypothesis
      • example (input):
      • IWSLT09_CT.devset_dialog01_02\01\best ASR hypothesis for 1st utterance
        IWSLT09_CT.devset_dialog01_02\02\2nd-best ASR hypothesis for 1st utterance
        ...
        IWSLT09_CT.devset_dialog01_02\20\20th-best ASR hypothesis for 1st utterance
        IWSLT09_CT.devset_dialog01_04\01\best ASR hypothesis for 2nd utterance
        ...
    • reference translations
      • each line consists of three fields divided by the character '\'
      • sentence consisting of words divided by single spaces
      • format: <SENTENCE_ID>\lt;PARAPHRASE_ID>\<REFERENCE>
      • Field_1: sentence ID
      • Field_2: paraphrase ID
      • Field_3: reference translation
      • example:
      • IWSLT09_CT.devset_dialog01_02\01\1st reference translation for 1st input
        IWSLT09_CT.devset_dialog01_02\02\2nd reference translation for 1st input
        ...
        IWSLT09_CT.devset_dialog01_04\01\1st reference translation for 2nd input
        IWSLT09_CT.devset_dialog01_04\02\2nd reference translation for 2nd input
        ...
  • Chinese-English
    • IWSLT05 testset: 506 sentences, 16 reference translations (read speech)
    • IWSLT06 devset: 489 sentences, 16 reference translations (read speech, spontaneous speech)
    • IWSLT06 testset: 500 sentences, 16 reference translations (read speech, spontaneous speech)
    • IWSLT08 devset: 245 sentences, 7 reference translations (spontaneous speech)
    • IWSLT08 testset: 506 sentences, 7 reference translations (spontaneous speech)
    • IWSLT09 devset: 10 dialogs, 200 sentences, 4 reference translations (spontaneous speech)
  • English-Chinese
    • IWSLT05 testset: 506 sentences, 16 reference translations (read speech)
    • IWSLT08 devset: 245 sentences, 7 reference translations (spontaneous speech)
    • IWSLT08 testset: 506 sentences, 7 reference translations (spontaneous speech)
    • IWSLT09 devset: 10 dialogs, 210 sentences, 4 reference translations (spontaneous speech)

CHALLENGE Test Corpus:

  • Chinese-English
    • 27 dialogs, 405 sentences
    • coding: → see CHALLENGE Develop Corpus
    • TXT data format: → see CHALLENGE Develop Corpus
    • INFO data format: → see CHALLENGE Training Corpus
  • English-Chinese
    • 27 dialogs, 393 sentences
    • coding: → see CHALLENGE Develop Corpus
    • TXT data format: → see CHALLENGE Training Corpus
    • INFO data format: → see CHALLENGE Training Corpus


Translation Input Conditions

Spontaneous Speech

  • Challenge Task
    • Chinese-English
    • English-Chinese
→ ASR output (word lattice, N-best, 1-best) of ASR engines provided by IWSLT organizers

Correct Recognition Results

  • Challenge Task
    • Chinese-English
    • English-Chinese
  • BTEC Task
    • Arabic-English
    • Chinese-English
    • Turkish-English
→ text input

Evaluation

Subjective Evaluation:

  • Metrics:
    • ranking
      (= official evaluation metrics to order MT system scores)
      → all primary run submissions
    • fluency/adequacy
      → top-ranked primary run submission
    • dialog adequacy
      (= adequacy judgments in the context of the given dialog)
      → top-ranked primary run submission
  • Evaluators:
    • 3 graders per translation

Automatic Evaluation:

  • Metrics:
    • BLEU/NIST (NIST v13)
    • → bug fixes to handle empty translations and IWSLT supplied corpus can be found here.
    → up to 7 reference translations
    → all run submissions
  • Evaluation Specifications:
    • case+punc:
      • case sensitive
      • with punctuation marks tokenized
    • no_case+no_punc:
      • case insensitive (lower-case only)
      • no punctuation marks
  • Data Processing Prior to Evaluation:
    • English MT Output:
      • simple tokenization of punctuations (see 'tools/ppEnglish.case+punc.pl' script)
    • Chinese MT Output:
      • segmentation into characters (see 'tools/splitUTF8Characters' script)

Important Dates

| トラックバック(0)

Evaluation Campaign

Event Date
Training Corpus Release June 19, 2009
Test Corpus Release Aug 14, 2009
Run Submission Due Aug 28, 2009
Result Feedback to Participants September 11, 2009
MT System Descriptions Due September 18, 2009
Notification of Acceptance October 16, 2009
Camera-ready Paper Due October 31, 2009
Workshop December 1 - 2, 2009

Technical Papers

Event Date
Paper Submission Due August 21, 2009
Notification of Acceptance October 9, 2009
Camera-ready Paper Due October 31, 2009
Workshop December 1 - 2, 2009

Downloads

| トラックバック(0)

IWSLT 2009 Corpus Release (for IWSLT 2009 participants only)

User License Agreement DOC, PDF
Download
  CHALLENGE Task
  • Chinese ↔ English
  BTEC Task
  • Arabic → English
  • Chinese → English
  • Turkish → English

In order to get access to the corpus, please follow the procedure below. Access will be enabled AFTER we received your original signed user license agreement.

  1. download the post-workshop user license agreement (click on DOC/PDF link above), sign it, and send it two copies to:

    Michael Paul
    National Institute of Information and Communications Technology
    Knowledge Creating Communciation Research Center
    MASTAR Project
    Language Translation Group
    3-5 Hikaridai, "Keihanna Science City"
    Kyoto 619-0289, Japan

  2. download the corpus files using the ID and Password you obtained for the download of the training data files for IWSLT 2009.


Corpus Data Files

- train:
  • (BTEC) 20K sentence pairs of translation examples with case and punctuation information segmented according to utilized ASR engine
  • (CHALLENGE) in addition to BTEC@train, 10K sentence pairs of translation examples with dialog annotations
- dev:
  • up to 6 evaluation data sets containing 500 source language sentences with multiple references and ASR output data files (= testsets of previous IWSLT evaluation campaigns)
- test:
  • 500 source language sentences and ASR output data files (= input of run-submissions of this years evaluation campaign)
- tools:
  • preprocessing scripts (tokenization, NBEST extraction, etc.) used to prepare the data sets

For data set details, click on the translation direction name tag.


CHALLENGE
Chinese-English
train dev test tools
TGZ TGZ TGZ TGZ
English-Chinese
train dev test tools
TGZ TGZ TGZ TGZ

BTEC
Arabic-to-English
train dev test tools
TGZ TGZ TGZ TGZ
Chinese-to-English
train dev test tools
TGZ TGZ TGZ TGZ
Turkish-to-English
train dev test tools
TGZ TGZ TGZ TGZ



Templates for LaTeX/MSWord

- Gzipped TAR archive (all template files): latex_template_iwslt09.tgz

- LaTeX style: iwslt09.sty
- Example document: template.tex
- Example document PS: template.ps
- Example document PDF: template.pdf
- Bibliography style: IEEEtran.bst
- MS-Word template: template.doc

Submission

| トラックバック(0)

Submissions of Technical Papers and MT System Descriptions must be done electronically in PDF format using the above links. The style-files and templates are available at the download page. Authors are strongly encouraged to use the provided LaTeX style files or MS-Word equivalents. Submissions should follow the "Paper Submission Format Guidelines" listed below.

Paper Submission Format Guidelines
The format of each paper submission (evaluation campaign and technical paper) should agree with the "Camera-Ready Paper Format Guidelines" listed below.

Camera-Ready Paper Format Guidelines
  • PDF file format
  • Maximum eight (8) pages (Standard A4 size: 210 mm by 297 mm preferred)
  • Single-spaced
  • Two (2) columns
  • Printed in black ink on white paper and check that the positioning (left and top margins) as well as other layout features are correct.
  • No smaller than nine (9) point type font throughout the paper, including figure captions.
  • To achieve the best viewing experience for the Proceedings, we strongly encourage to use Times-Roman font (the LaTeX style file as well as the Word template files use Times-Roman). This is needed in order to give the Proceedings a uniform look.
  • Do NOT include headers and footers. The page numbers and conference identification will be post processed automatically, at the time of printing the Proceedings.
  • The first page should have the paper title, author(s), and affiliation(s) centered on the page across both columns. The remainder of the text must be in the two-column format, staying within the indicated image area.
  • Follow the style of the sample paper that is included with regard to title, authors, affiliations, abstract, heading, and subheadings.
Paper Title The paper title must be in boldface. All non-function words must be capitalized, and all other words in the title must be lower case. The paper title is centered across the top of the two columns on the first page as indicated above
Authors' Name(s) The authors' name(s) and affiliation(s) appear centered below the paper title. If space permits, include a mailing address here. The templates indicate the area where the title and author information should go. These items need not be strictly confined to the number of lines indicated; papers with multiple authors and affiliations, for example, may require two or more lines for this information.
Abstract Each paper must contain an abstract that appears at the beginning of the paper.
Major Headings Major headings are in boldface, with the first word capitalized and the rest of the heading in lower case. Examples of the various levels of headings are included in the templates.
Sub Headings Sub headings appear like major headings, except they start at the left margin in the column.
Sub-Sub Headings Sub-sub headings appear like sub headings, except they are in italics and not bold face.
References Number and list all references at the end of the paper. The references are numbered in order of appearance in the document. When referring to them in the text, type the corresponding reference number in square brackets as shown at the end of this sentence [1]. (This is done automatically when using the Latex template).
Illustrations Illustrations must appear within the designated margins, and must be positioned within the paper margins. They may span the two columns. If possible, position illustrations at the top of columns, rather than in the middle or at the bottom. Caption and number every illustration. All half-tone or color illustrations must be clear when printed in black and white.


Templates
If your paper will be typeset using LaTeX, please download the template package here that will generate the proper format. To extract files under UNIX run: $ unzip latex_template_iwslt09.tgz
Paper Status
After submission, each paper will be given a unique Paper ID and a password. This will be shown on the confirmation page right after submission of the documents and a confirmation email including the Paper ID will be sent to the author of the paper as well. It will be possible to check and correct (if necessary) the submitted paper information (names, affiliations, etc.). Corrections/uploads can be made up to the respective submission deadline.

Paper Acceptance/Rejection Information
Each corresponding author will be notified by e-mail of acceptance/rejection. Reviewer feedback will also be available for each paper.

Run Submission Guidelines

| トラックバック(0)

BTEC Translation Task (BTEC_AE, BTEC_CE, BTEC_TE)

data format:

  • same format as the DEVELOP data sets.
For details, refer to the respective README files:
+ IWSLT/2009/corpus/BTEC/Arabic-English/README.BTEC_AE.txt
+ IWSLT/2009/corpus/BTEC/Chinese-English/README.BTEC_CE.txt
+ IWSLT/2009/corpus/BTEC/Turkish-English/README.BTEC_TE.txt
  • input text is case-sensitive and contains punctuations
  • English MT output should:
    • be in the same format as the input file (<SentenceID>\01\MT_output_text)
    • be case-sensitive, with punctuations
    • contain the same amount of lines (=sentences) as the input file
Example:
     TEST_IWSLT09_001\01\This is the E translation of the 1st sentence.
     TEST_IWSLT09_002\01\This is the E translation of the 2nd sentence.
     TEST_IWSLT09_003\01\
     TEST_IWSLT09_004\01\The previous input (ID=003) could not be translated, thus the translation is empty!
     TEST_IWSLT09_005\01\...
     ...
     TEST_IWSLT09_469\01\This is the E translation of the last sentence.

run submission format:
  • each participant has to translate and submit at least one translation of the given input files for each of the translation task they registered for.
  • multiple run submissions are allowed, but participants have to explicitly indicate one PRIMARY run that will be used for human assessments. All other run submissions are treated as CONTRASTIVE runs. In case that none of the runs is marked as PRIMARY, the latest submission (according to the file time-stamp) will be used for the subjective evaluation.
  • runs have to be submitted as a gzipped TAR archive (format see below) and send as an email attachement to "Michael Paul" (michael.paul@nict.go.jp).
TAR archive file structure:
<UserID>/<TranslationTask>.<UserID>.primary.txt
        /<TranslationTask>.<UserID>.contrastive1.txt
        /<TranslationTask>.<UserID>.contrastive2txt
        /...
    where: <UserID> = user ID of participant used to download data files
           <TranslationTask> = BTEC_AE | BTEC_CE | BTEC_TE

Examples:
nict/BTEC_AE.nict.primary.txt
   /BTEC_CE.nict.primary.txt
   /BTEC_CE.nict.contrastive1.txt
   /BTEC_CE.nict.contrastive2.txt
   /BTEC_CE.nict.contrastive3.txt
   /BTEC_TE.nict.primary.txt
   /BTEC_TE.nict.contrastive1.txt      
  • re-submitting your runs is allowed as far as the mails arrive BEFORE the submission deadline. In case that multiple TAR archives are submitted by the same participant, only the runs of the most recent submission mail will be used for the IWSLT 2009 evaluation and previous mails will be ignored.

CHALLENGE Translation Task (CT_CE, CT_EC)

data format:

  • same format as the DEVELOP data sets.
For details, refer to the respective README files:
+ IWSLT/2009/corpus/CHALLENGE/Chinese-English/
  README.CT_CE.txt
+ IWSLT/2009/corpus/CHALLENGE/English-Chinese/
  README.CT_EC.txt
  • the input data sets are created from the speech recognition results (ASR output) and therefore are CASE-INSENSITIVE and does NOT contain punctuations
  • the input data sets of the CHALLENGE tasks are separated according to the source language:
   + Chinese input data:
       IWSLT/2009/corpus/CHALLENGE/Chinese-English/test
   + English input data:
       IWSLT/2009/corpus/CHALLENGE/English-Chinese/test
The dialog structure is reflected in the respective sentence ID.
Example:
(dialog structure)

  IWSLT09_CT.testset_dialog01_01\01\...1st English utterance...
  IWSLT09_CT.testset_dialog01_02\01\...1st Chinese utterance...
  IWSLT09_CT.testset_dialog01_03\01\...2nd English utterance...
  IWSLT09_CT.testset_dialog01_04\01\...2nd Chinese utterance...
  IWSLT09_CT.testset_dialog01_05\01\...3rd Chinese utterance...
  IWSLT09_CT.testset_dialog01_06\01\...3rd English utterance...
  ...
(English input data to be translated into Chinese)
    + IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/TXT/
      IWSLT09_CT.testset.en.txt

      IWSLT09_CT.testset_dialog01_01\01\...1st English utterance...
      IWSLT09_CT.testset_dialog01_03\01\...2nd English utterance...
      IWSLT09_CT.testset_dialog01_06\01\...3rd English utterance...
      ...
(Chinese input data to be translated into English)
    + IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/TXT/
      IWSLT09_CT.testset.zh.txt

      IWSLT09_CT.testset_dialog01_02\01\...1st Chinese utterance...
      IWSLT09_CT.testset_dialog01_04\01\...2nd Chinese utterance...
      IWSLT09_CT.testset_dialog01_05\01\...3rd Chinese utterance...
      ...
  • English MT output should:
     + be in the same format as the Chinese input file
       (<SentenceID>\01\MT_output_text)
     + be case-sensitive, with punctuations
     + contain the same amount of lines (=sentences) as the Chinese input file
Example:
    + nict/CT_CE.nict.primary.txt
      IWSLT09_CT.testset_dialog01_02\01\...E translation of 1st Chinese utterance...
      IWSLT09_CT.testset_dialog01_04\01\...E translation of 2nd Chinese utterance...
      IWSLT09_CT.testset_dialog01_05\01\...E translation of 3rd Chinese utterance...
      ...
  • Chinese MT output should:
     + be in the same format as the English input file (<SentenceID>\01\MT_output_text)
     + be case-sensitive, with punctuations
     + contain the same amount of lines (=sentences) as the English input file
Example:
    + nict/CT_EC.nict.primary.txt
      IWSLT09_CT.testset_dialog01_01\01\...C translation of 1st English utterance...
      IWSLT09_CT.testset_dialog01_03\01\...C translation of 2nd English utterance...
      IWSLT09_CT.testset_dialog01_06\01\...C translation of 3rd English utterance...
      ...

run submission format:
  • each participant registered for the Challenge Task has to translate both translation directions (English-Chinese AND Chinese-English) and submit a total of 4 MT output files per run:
    + translations of 2 input data conditions (CRR, ASR) for Chinese-English AND
    + translations of 2 input data conditions (CRR, ASR) for English-Chinese.

(1) the correct recognition result (CRR) data files, i.e., the human transcriptions of the Challenge Task data files that do not include recognition errors:
         CE: IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/TXT/
             IWSLT09_CT.testset.zh.txt
         EC: IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/TXT/
             IWSLT09_CT.testset.en.txt

(2) the speech recognition output (ASR output, with recognition errors), whereby the participants are free to choose any of the following three ASR output data types as the input of their MT system:
        (a) word lattices:
            CE: IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
                SLF/testset/*.zh.SLF
            EC: IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
                SLF/testset/*.en.SLF

        (b) NBEST hypotheses:
            CE: IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
                NBEST/IWSLT09.testset.zh.20BEST.txt
                or
                IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
                NBEST/testset/*.zh.20BEST.txt

            EC: IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
                NBEST/IWSLT09.testset.en.20BEST.txt
                or
                IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
                NBEST/testset/*.en.20BEST.txt
[NOTE] larger NBEST list can be generated from the lattice data files using the following tools:
              + IWSLT/2009/corpus/CHALLENGE/Chinese-English/tools/
                extract_NBEST.zh.CT_CE.testset.sh
              + IWSLT/2009/corpus/CHALLENGE/English-Chinese/tools/
                extract_NBEST.en.CT_EC.testset.sh

        (c) 1BEST hypotheses:
            CE: IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
                1BEST/IWSLT09.testset.zh.1BEST.txt
                or               
                IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
                1BEST/testset/*.zh.1BEST.txt
            EC: IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
                1BEST/IWSLT09.testset.en.1BEST.txt
                or
                IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
                1BEST/testset/*.en.1BEST.txt
[NOTE] submissions containing only the results for one translation direction will be excluded from the subjective evaluation for IWSLT 2009.
  • multiple run submissions are allowed, but participants have to explicitly indicate one PRIMARY run that will be used for human assessments. All other run submissions are treated as CONTRASTIVE runs. In case that none of the runs is marked as PRIMARY, the latest submission (according to the file time-stamp) will be used for the subjective evaluation.
  • runs have to be submitted as a gzipped TAR archive (format see below) and send as an email attachement to "Michael Paul" (michael.paul@nict.go.jp).
TAR archive file structure:
<UserID>/CT_CE.<UserID>.primary.CRR.txt
        /CT_CE.<UserID>.primary.ASR.<CONDITION>.txt
        /CT_EC.<UserID>.primary.CRR.txt
        /CT_EC.<UserID>.primary.ASR.<CONDITION>.txt
        /...
where: <UserID> = user ID of participant used to download data files
      <CONDITION> = SLF | <NUM>
      <NUM> = number of recognition hypotheses used for translation, e.g.,
                '1'  - 1-best recognition result
                '20' - 20-best hypotheses list
Examples:
    nict/CT_CE.nict.primary.CRR.txt
        /CT_CE.nict.primary.ASR.SLF.txt
        /CT_EC.nict.primary.CRR.txt
        /CT_EC.nict.primary.ASR.SLF.txt

        /CT_CE.nict.contrastive1.CRR.txt
        /CT_CE.nict.contrastive1.ASR.1.txt
        /CT_EC.nict.contrastive1.CRR.txt
        /CT_EC.nict.contrastive1.ASR.1.txt

        /CT_CE.nict.contrastive2.CRR.txt
        /CT_CE.nict.contrastive2.ASR.20.txt
        /CT_EC.nict.contrastive2.CRR.txt
        /CT_EC.nict.contrastive2.ASR.20.txt      
  • re-submitting your runs is allowed as far as the mails arrive BEFORE the submission deadline. In case that multiple TAR archives are submitted by the same participant, only the runs of the most recent submission mail will be used for the IWSLT 2009 evaluation and previous mails will be ignored.
We prepared an online evaluation server that allows you to conduct additional experiments to confirm the effectiveness of innovative methods and features within the IWSLT 2009 evaluation framework. You can submit translation hypotyhesis files for any of the IWSLT 2009 translation tasks. The hypothesis file format is the same as for the official run submissions.

Before you can submit runs, you have to register a UserID/PassID. After login, click on "Make a new Submission", select the "Translation Direction" and "Training Data Condition" you used to generate the hypothesis file, upload the hypothesis file, specify a system ID and a short description that allows you to easily identify the run submission, and press "Calculate Scores".

The server will sequentially calculate automatic scores for BLEU/NIST, WER/PER/TER, and METEOR/F1/PREC/RECL and GTM. Finally, the automatic scoring results will be send to you via email. In addition, you can access the "Submission Log" which keeps track on all your run submissions. For details on a specific run, please click on the respective "Date". The scoring results of the "case+punc" evaluation specifications (case-sensitive, with punctuations) are displayed in bold-face and the scoring results of the "no_case+no_punc" evaluation specifications (case-insensitive, without punctuations) are displayed in brackets.

Registration

| トラックバック(0)

Workshop

The registration for the IWSLT 2009 workshop is now open. Please access the registration server and fill out the registration form.

Registration Fees Deadline Payment
Regular Student
Early JPY 20,000 JPY 15,000 Nov 20, 2009 online
Late JPY 25,000 JPY 20,000 Nov 30, 2009 on-the-door
(registration desk)
On-site JPY 30,000 JPY 25,000 Dec 1 - 2, 2009 on-the-door
(registration desk)

The registration fees include: daily lunch, coffee breaks, USB-stick version of the proceedings, participation in all session, and a banquet dinner on December 1. Please note, that the registration fee is not refundable under any circumstances.

Online payment is only possible until the extended Early Registration Deadline. During the Late Registration period, you have to input the registration form online, but the payment has to be done on the door at the workshop registration desk (7F Miraikan).

Concerning on-the-door payments, only cash payments can be accepted.

If you need a visa for coming to Japan, please contact the IWSLT Secretariat (iwslt@the-convention.co.jp) as soon as possible, but not later than October 2nd, 2009.

If you don't know whether your need a visa or not, please check here.


Accommodations

| トラックバック(0)

Hotel rates differ from day to day.
Please check-out the links below or contact the hotel directly.


ホテル グランパシフィック LE DAIBA
(Grand Pacific Le Daiba)

[map] [reservations]
「〒135-8701 東京都港区台場2-6-1」
Daiba 2-6-1, Daiba, Minato-ku, Tokyo 135-8701, Japan
(Tel) +81-3-5500-6711 (Fax) +81-3-5500-4507
http://www.grandpacific.jp/eng (English)
http://www.grandpacific.jp (Japanese)
19,000 JPY〜 (1 person, 1 night)
(U07) Daiba station [map]
←1min,180JPY→ (U08) Fune-no-kagakukan station [venue]

三井ガーデンホテル汐留イタリア街
(Mitsui Garden Hotel Shiodome Italia-gai)

[map] [reservations]
「〒105-0021 東京都港区東新橋2-14-24」
2-14-24, Higashi-shimbashi, Minato-ku, Tokyo 105-0021, Japan
(Tel) +81-3-3431-1131 (Fax) +81-3-3431-2431
http://www.gardenhotels.co.jp/eng/shiodome.html (English)
http://www.gardenhotels.co.jp/shiodome/index.html (Japanese)
10,300 JPY〜 (1 person, 1 night)
(U02) Shiodome station [map]
←15min,310JPY→ (U08) Fune-no-kagakukan station [venue]

ホテルヴィラ フォンテーヌ汐留
(Hotel Villa Fontaine Shiodome)

[map] [reservations]
「〒105-0021 東京都港区東新橋1-9-2」
1-9-2 Higashi-shinbashi Minato-ku, Tokyo 105-0021, Japan
(Tel) +81-3-3569-2220 (Fax) +81-3-3569-2111
http://www.hvf.jp/eng/shiodome.php (English)
http://www.hvf.jp/shiodome (Japanese)
http://www.hvf.jp/chi/shiodome.html (Chinese)
10,000 JPY〜 (1 person, 1 night)
(U02) Shiodome station [map]
←15min,310JPY→ (U08) Fune-no-kagakukan station [venue]

ホテル日航東京
(Hotel Nikko Tokyo)

[map] [reservations]
「〒135-8625 東京都港区台場1丁目9番1号」
1-9-1 Daiba, Minato-ku, Tokyo 135-8625, Japan
(Tel) +81-3-5500-5500 (Fax) +81-3-5500-2525
http://www.hnt.co.jp/en/index.html (English)
http://www.hnt.co.jp/ (Japanese)
http://www.jalhotels.com/cn/domestic/kanto/index.html#tokyo (Chinese)
9,500 JPY〜 (1 person, 1 night)
(U07) Daiba station [map]
←1min,180JPY→ (U08) Fune-no-kagakukan station [venue]

ホテルトラスティ東京ベイサイド
(Hotel Trusty Tokyo Bayside)

[map] [reservations]
「〒135-0063 東京都江東区有明3-1-5」
3-1-5 Ariake, Koto-ki, Tokyo 135-0063, Japan
(Tel) +81-3-6700-0001 (Fax) +81-3-6700-0007
http://www.trusty.jp/tokyobayside/pdf/tokyobayside_e.pdf (English)
http://www.trusty.jp/tokyobayside (Japanese)
6,700 JPY〜 (1 person, 1 night)
(U11) Kokusai-tenjijou-seimon station [map]
←6min,240JPY→ (U08) Fune-no-kagakukan station [venue]

ホテルサンルート有明
(Hotel Sunroute Ariake)

[map] [reservations]
「〒135-0063 東京都江東区有明3-1-20」
3-1-20 Ariake, Koutou-Ku, Tokyo 135-0063, Japan
(Tel) +81-3-5530-3610
http://www.sunroutehotel.jp/hari-eng/index.asp (English)
http://www.sunroutehotel.jp/ariake/ (Japanese)
http://www.sunroutehotel.jp/hari-chi/index.asp (Chinese)
6,500 JPY〜 (1 person, 1 night)
(U11) Kokusai-tenjijou-seimon station [map]
←6min,240JPY→ (U08) Fune-no-kagakukan station [venue]

東京ベイ有明ワシントンホテル
(Tokyo Bay Ariake Washington Hotel)

[map] [reservations]
「〒135-0063 東京都江東区有明3-1-28」
3-1-28 Ariake, Koto-ku, Tokyo, Japan 135-0063, Japan
(Tel) +81-3-5564-0111 (Fax) +81-3-5564-0525
http://www.wh-rsv.com/english/tokyo_bay_ariake (English)
http://www.wh-rsv.com/wh/hotels/ariake/index.html (Japanese)
http://www.wh-rsv.com/chinese/tokyo_bay_ariake (Chinese)
6,000 JPY〜 (1 person, 1 night)
(U12) Ariake station [map]
←7min,240JPY→ (U08) Fune-no-kagakukan station [venue]

Program

| トラックバック(0)

December 1, 2009

09:00 09:30 workshop registration

Workshop Opening
09:30 09:40 Welcome Remarks
Satoshi NAKAMURA (NICT, Japan)
Evaluation Campaign: "Overview Talk"
09:40 10:10 Overview of the IWSLT 2009 Evaluation Campaign
Michael PAUL (NICT, Japan)
coffee break
Evaluation Campaign: "Challenge Task"
10:30 11:00 Two methods for stabilizing MERT: NICT at IWSLT 2009
Masao UTIYAMA, Hirofumi YAMAMOTO, Eiichiro SUMITA (NICT, Japan)
11:00 11:30 Low-Resource Machine Translation Using MaTrEx: The DCU Machine Translation System for IWSLT 2009
Yanjun MA, Tsuyoshi OKITA, Özlem ÇETINOGLU, Jinhua DU, Andy WAY (Dublin City University, Ireland)
11:30 12:00 The CASIA Statistical Machine Translation System for IWSLT 2009
Maoxi LI, Jiajun ZHANG , Yu ZHOU, Chengqing ZONG (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences; China)
lunch break
Invited Talk
13:00 14:00 Human Translation and Machine Translation
Philipp KOEHN (University of Edinburgh, UK)
Technical Paper: "Oral I"
14:00 14:30 Morphological Pre-Processing for Turkish to English Statistical Machine Translation
Arianna BISAZZA, Marcello FEDERICO (FBK-irst, Italy)
14:30 15:00 Enriching SCFG Rules Directly From Efficient Bilingual Chart Parsing
Martin CMEJREK, Bowen ZHOU, Bing XIANG (IBM, USA)
15:00 15:30 A Unified Framework for Phrase-Based, Hierarchical, and Syntax-Based Statistical Machine Translation
Hieu HOANG, Philipp KOEHN, Adam LOPEZ (Univ. Edinburgh, UK)
coffee break
Evaluation Campaign: "Poster I"
15:50 16:50 The TÜBITAK-UEKAE Statistical Machine Translation System for IWSLT 2009
Coskun MERMER, Hamza KAYA, Mehmet Ugur DOGAN (TÜBITAK-UEKAE, Turkey)
15:50 16:50 The UOT System: Improve String-to-Tree Translation Using Head-Driven Phrasal Structure Grammar and Predicate-Argument Structures
Xianchao WU, Takuya MATSUZAKI, Naoaki OKAZAKI, Yusuke MIYAO, Jun'ichi TSUJII (University of Tokyo, Japan)
15:50 16:50 The GREYC Translation Memory for the IWSLT2009 Evaluation Campaign: one step beyond translation memory
Yves LEPAGE, Adrien LARDILLEUX, Julien GOSME (University of Caen, France)
15:50 16:50 The ICT Statistical Machine Translation Systems for the IWSLT 2009
Haitao MI, Yang LIU, Tian XIA, Xinyan XIAO, Yang FENG, Jun XIE, Hao XIONG, Zhaopeng TU, Daqi ZHENG, Yanjuan LU, Qun LIU (Institute of Computing Technology, Chinese Academy of Sciences; China)
15:50 16:50 The University of Washington Machine Translation System for IWSLT 2009
Mei YANG, Amittai AXELROD, Kevin DUH, Katrin KIRCHHOFF (University of Washington, USA)
15:50 16:50 Statistical Machine Translation adding Pattern based Machine translation in Chinese-English Translation
Jin'ichi MURAKAMI, Masato TOKUHISA, Satoru IKEHARA (Tottori University, Japan)
Demo Session
16:50 17:20 Network-based Speech-to-Speech Translation
Chiori HORI, Sakriani SAKTI, Michael PAUL, Satoshi NAKAMURA (NICT, Japan)
Banquet
18:00 20:00 Restaurant "LA TERRE" (Miraikan, 7F)

December 2, 2009

09:00 09:30 workshop registration

Invited Talk
09:30 10:30 Two-way Speech-to-Speech Translation for Communicating Across Language Barriers
Premkumar NATARAJAN (BBN Technologies, USA)
coffee break
Technical Paper: "Oral II"
10:50 11:20 Structural Support Vector Machines for Log-Linear Approach in Statistical Machine Translation
Katsuhiko HAYASHI (University of Doshisha, Japan); Taro WATANABE, Hajime TSUKADA and Hideki ISOZAKI (NTT, Japan)
11:20 11:50 Online Language Model Adaptation for Spoken Dialog Translation
Germán SANCHIS-TRILLES (Universitat Politècnica de València, Spain); Mauro CETTOLO, Nicola BERTOLDI, Marcello FEDERICO (FBK-irst, Italy)
lunch break
Invited Talk
13:00 14:00 Monolingual Knowledge Acquisition and a Multilingual Information Environment
Kentaro TORISAWA (NICT, Japan)
Evaluation Campaign: "Poster II"
14:00 15:00 AppTek Turkish-English Machine Translation System Description for IWSLT 2009
Selçuk KÖPRÜ (Apptek Inc., Turkey)
14:00 15:00 LIG approach for IWSLT09 : Using Multiple Morphological Segmenters for Spoken Language Translation of Arabic
Fethi BOUGARES, Laurent BESACIER, Hervé BLANCHON (LIG, France)
14:00 15:00 Barcelona Media SMT system description for the IWSLT 2009: introducing source context information
Marta R. COSTA-JUSSA, Rafael E. BANCHS (Barcelona Media, Spain)
14:00 15:00 FBK @ IWSLT-2009
Nicola BERTOLDI, Arianna BISAZZA, Mauro CETTOLO, Marcello FEDERICO (FBK-irst, Italy); Germán SANCHIS-TRILLES (Universitat Politècnica de València, Spain)
14:00 15:00 LIUM's Statistical Machine Translation Systems for IWSLT 2009
Holger SCHWENK, Loïc BARRAULT, Yannick ESTÈVE, Patrik LAMBER (University of le Mans, France)
14:00 15:00 I²R's Machine Translation System for IWSLT 2009
Xiangyu DUAN, Deyi XIONG, Hui ZHANG, Min ZHANG, Haizhou LI (Institute for Infocomm Research, Singapore)
coffee break
Evaluation Campaign: "BTEC Task"
15:20 15:50 The NUS Statistical Machine Translation System for IWSLT 2009
Preslav NAKOV, Chang LIU, Wei LU, Hwee Tou NG (National University of Singapore, Singapore)
15:50 16:20 The UPV Translation System for IWSLT 2009
Guillem GASCÓ, Joan Andreu SÁNCHEZ (Universitat Politècnica de València, Spain)
16:20 16:50 The MIT-LL/AFRL System for IWSLT 2009
Wade SHEN, Brian DELANEY, Arya Ryan AMINZADEH (MIT Lincoln Laboratory, USA); Timothy ANDERSON, Raymond SLYH (Air Force Research Laboratory, USA)
Workshop Closing
16:50 17:00 Closing Remarks
Marcello FEDERICO (FBK-irst, Italy)

Keynote Speeches

| トラックバック(0)

Keynote Speech 1

Human Translation and Machine Translation
Philipp KOEHN (University of Edinburgh, UK)
While most of recent machine translation work has focus on the gisting application (i.e., translating web pages), another important application is to aid human translators. To build better computer aided translation tools, we first need to understand how human translators work. We discuss how human translators work and what tools they typically use. We also build a novel tool that offers post-editing, interactive sentence completion, and display of translation options (online at www.caitra.org). We collected timing logs on interactions with the tool, which allows detailed analysis of translator behavior.

Keynote Speech 2

Two-way Speech-to-Speech Translation for Communicating Across Language Barriers
Premkumar NATARAJAN (BBN Technologies, USA)
Two-way speech-to-speech (S2S) translation is a spoken language application that integrates multiple technologies including speech recognition, machine translation, text-to-speech synthesis, and dialog management. In recent years, research into S2S systems has resulted in several modeling techniques for improving coverage on broad domains and rapid configuration for new language pairs or domains. This talk will highlight recent advances in S2S area that range from improvements in component technologies to improvements in the end-to-end system for mobile use. I will also present metrics for evaluating the S2S technology, a methodology for determining the impact of different causes of errors, and future directions for research and development.

Keynote Speech 3

Monolingual Knowledge Acquisition and a Multilingual Information Environment
Kentaro TORISAWA (NICT, Japan)
Large-scale knowledge acquisition from the Web has been a popular research topic in the last five years. This talk gives an overview of our current project aiming at the acquisition of a large scale semantic network from the Web, and in the talk I explore its possible interaction with machine translation research. Particularly, I would like to focus on two topics; multilingual corpora as source of knowledge and the applications of machine translation enabled by our technology. I will discuss a framework of bilingual co-training that gives a marked improvement in accuracy of the acquired knowledge by using two corpora written in two different languages. Also, I will show our technology can enable a new type of tasks for machine translation in Web applications.

Proceedings

| トラックバック(0)
- Author Index -

Evaluation Campaign
pp.1-18 paper slides bib Overview of the IWSLT 2009 Evaluation Campaign
Michael PAUL
pp.19-23 paper (not yet) bib apptek
AppTek Turkish-English Machine Translation System Description for IWSLT 2009
Selçuk KÖPRÜ
pp.24-28 paper poster bib bmrc
Barcelona Media SMT system description for the IWSLT 2009: introducing source context information
Marta R. COSTA-JUSSA, Rafael E. BANCHS
pp.29-36 paper slides bib dcu
Low-Resource Machine Translation Using MaTrEx: The DCU Machine Translation System for IWSLT 2009
Yanjun MA, Tsuyoshi OKITA, Özlem ÇETINOGLU, Jinhua DU, Andy WAY
pp.37-44 paper poster bib fbk
FBK @ IWSLT-2009
Nicola BERTOLDI, Arianna BISAZZA, Mauro CETTOLO, Marcello FEDERICO (FBK-irst, Italy); Germán SANCHIS-TRILLES (Universitat Politècnica de València, Spain)
pp.45-49 paper poster bib greyc
The GREYC Translation Memory for the IWSLT 2009 Evaluation Campaign: one step beyond translation memory
Yves LEPAGE, Adrien LARDILLEUX, Julien GOSME
pp.50-54 paper poster bib i2r
I²R's Machine Translation System for IWSLT 2009
Xiangyu DUAN, Deyi XIONG, Hui ZHANG, Min ZHANG, Haizhou LI
pp.55-59 paper poster bib ict
The ICT Statistical Machine Translation Systems for the IWSLT 2009
Haitao MI, Yang LIU, Tian XIA, Xinyan XIAO, Yang FENG, Jun XIE, Hao XIONG, Zhaopeng TU, Daqi ZHENG, Yajuan LU, Qun LIU
pp.60-64 paper poster bib lig
LIG approach for IWSLT09 : Using Multiple Morphological Segmenters for Spoken Language Translation of Arabic
Fethi BOUGARES, Laurent BESACIER, Hervé BLANCHON (LIG, France)
pp.65-70 paper poster bib lium
LIUM's Statistical Machine Translation Systems for IWSLT 2009
Holger SCHWENK, Loïc BARRAULT, Yannick ESTÈVE, Patrik LAMBERT
pp.71-78 paper slides bib mit
The MIT-LL/AFRL IWSLT-2009 System
Wade SHEN, Brian DELANEY, Arya Ryan AMINZADEH (MIT Lincoln Laboratory, USA); Timothy ANDERSON, Raymond SLYH (Air Force Research Laboratory)
pp.79-82 paper slides bib nict
Two methods for stabilizing MERT: NICT at IWSLT 2009
Masao UTIYAMA, Hirofumi YAMAMOTO, Eiichiro SUMITA
pp.83-90 paper slides bib nlpr
The CASIA Statistical Machine Translation System for IWSLT 2009
Maoxi LI, Jiajun ZHANG, Yu ZHOU, Chengqing ZONG
pp.91-98 paper slides bib nus
The NUS Statistical Machine Translation System for IWSLT 2009
Preslav NAKOV, Chang LIU, Wei LU, Hwee Tou NG
pp.99-106 paper poster bib tokyo
The UOT System: Improve String-to-Tree Translation Using Head-Driven Phrasal Structure Grammar and Predicate-Argument Structures
Xianchao WU, Takuya MATSUZAKI, Naoaki OKAZAKI, Yusuke MIYAO, Jun'ichi TSUJII
pp.107-112 paper poster bib tottori
Statistical Machine Translation adding Pattern-based Machine Translation in Chinese-English Translation
Jin'ichi MURAKAMI, Masato TOKUHISA, Satoru IKEHARA
pp.113-117 paper poster bib tubitak
The TÜBITAK-UEKAE Statistical Machine Translation System for IWSLT 2009
Coskun MERMER, Hamza KAYA, Mehmet Ugur DOGAN
pp.118-123 paper slides bib upv
UPV Translation System for IWSLT 2009
Guillem GASCÓ, Joan Andreu SÁNCHEZ
pp.124-128 paper poster bib uw
The University of Washington Machine Translation System for IWSLT 2009
Mei YANG, Amittai AXELROD, Kevin DUH, Katrin KIRCHHOFF

Technical Paper
pp.129-135 paper slides bib Morphological Pre-Processing for Turkish to English Statistical Machine Translation
Arianna BISAZZA, Marcello FEDERICO
pp.136-143 paper slides bib Enriching SCFG Rules Directly From Efficient Bilingual Chart Parsing
Martin CMEJREK, Bowen ZHOU, Bing XIANG
pp.144-151 paper slides bib Structural Support Vector Machines for Log-Linear Approach in Statistical Machine Translation
Katsuhiko HAYASHI, Taro WATANABE, Hajime TSUKADA, Hideki ISOZAKI
pp.152-159 paper slides bib A Unified Framework for Phrase-Based, Hierarchical, and Syntax-Based Statistical Machine Translation
Hieu HOANG, Philipp KOEHN, Adam LOPEZ
pp.160-167 paper slides bib Online Language Model Adaptation for Spoken Dialog Translation
German SANCHIS-TRILLES, Mauro CETTOLO, Nicola BERTOLDI, Marcello FEDERICO

Demo
pp.168-168 paper slides bib Network-based Speech-to-Speech Translation
Chiori HORI, Sakriani SAKTI, Michael PAUL, Noriyuki KIMURA, Yutaka ASHIKARI, Ryosuke ISOTANI, Eiichiro SUMITA, Satoshi NAKAMURA (NICT, Japan)

Keynote Speech
      -       abstract slides - Human Translation and Machine Translation
Philipp KOEHN (University of Edinburgh, UK)
      -       abstract (not yet) - Two-way Speech-to-Speech Translation for Communicating Across Language Barriers
Premkumar NATARAJAN (BBN Technologies, USA)
      -       abstract slides - Monolingual Knowledge Acquisition and a Multilingual Information Environment
Kentaro TORISAWA (NICT, Japan)

Author Index

| トラックバック(0)
A-B-C-D-E-F-G-H-I-K-L-M-N-O-P-S-T-U-W-X-Y-Z

  A  
  AMINZADEH, Arya Ryan  71
  ANDERSON, Timothy   71
  ASHIKARI, Yutaka   168
  AXELROD, Amittai   124
  B  
  BANCHS, Rafael E.   24
  BARRAULT, Loïc   65
  BERTOLDI, Nicola   37, 160
  BESACIER, Laurent   60
  BISAZZA, Arianna   37, 129
  BLANCHON, Hervé   60
  BOUGARES, Fethi   60
  C  
  ÇETINOGLU, Özlem   29
  CETTOLO, Mauro   37, 160
  CMEJREK, Martin   136
  COSTA-JUSSÀ, Marta R.   24
  D  
  DELANEY, Brian   71
  DOGAN, Mehmet Ugur   113
  DU, Jinhua   29
  DUAN, Xiangyu   50
  DUH, Kevin   124
  E  
  ESTÈVE, Yannick   65
  F  
  FEDERICO, Marcello   37, 129, 160
  FENG, Yang   55
  G  
  GASCÓ, Guillem   118
  GOSME, Julian   45
  H  
  HAYASHI, Katsuhiko   144
  HOANG, Hieu   152
  HORI, Chiori   168
  I  
  IKEHARA, Satoru   107
  ISOTANI, Ryosuke   168
  ISOZAKI, Hideki   144
  K  
  KAYA, Hamza   113
  KIMURA, Noriyuki   168
  KIRCHHOFF, Katrin   124
  KOEHN, Philipp   152
  KÖPRÜ, Selçuk   19
  L  
  LAMBERT, Patrik   65
  LARDILLEUX, Adrien   45
  LEPAGE, Yves   45
  LI, Haizhou   50
  LI, Maoxi   83
  LIU, Chang   91
  LIU, Qun   55
  LIU, Yang   55
  LOPEZ, Adam   152
  LU, Wei   91
  LU, Yajuan   55
  M  
  MA, Yanjun   29
  MATSUZAKI, Takuya   99
  MERMER, Coskun   113
  MI, Haitao   55
  MIYAO, Yusuke   99
  MURAKAMI, Jin'ichi   107
  N  
  NAKAMURA, Satoshi   168
  NAKOV, Preslav   91
  NG, Hwee Tou   91
  O  
  OKAZAKI, Naoaki   99
  OKITA, Tsuyoshi   29
  P  
  PAUL, Michael   1, 168
  S  
  SAKTI, Sakriani   168
  SÁNCHEZ, Joan Andreu   118
  SANCHIS-TRILLES, Germán   37, 160
  SCHWENK, Holger   65
  SHEN, Wade   71
  SLYH, Raymond   71
  SUMITA, Eiichiro   79, 168
  T  
  TOKUHISA, Masato   107
  TSUJII, Jun'ichi   99
  TSUKADA, Hajime   144
  TU, Zhaopeng   55
  U  
  UTIYAMA, Masao   79
  W  
  WATANABE, Taro   144
  WAY, Andy   29
  WU, Xianchao   99
  X  
  XIA, Tian   55
  XIANG, Bing   136
  XIAO, Xinyan   55
  XIE, Jun   55
  XIONG, Deyi   50
  XIONG, Hao   55
  Y  
  YAMAMOTO, Hirofumi   79
  YANG, Mei   124
  Z  
  ZHANG, Hui   50
  ZHANG, Jiajun   83
  ZHANG, Min   50
  ZHENG, Daqi   55
  ZHOU, Bowen   136
  ZHOU, Yu   83
  ZONG, Chengqing   83

Bibliography

| トラックバック(0)
@inproceedings{iwslt09:EC:overview,
author= {Michael Paul},
title= {{Overview of the IWSLT 2009 Evaluation Campaign}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {1-18},
}
 
@inproceedings{iwslt09:EC:apptek,
author= {Sel\c{o}uk K\"{o}pr\"{u}},
title= {{AppTek Turkish-English Machine Translation System Description for IWSLT 2009}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {19-23},
}
 
@inproceedings{iwslt09:EC:bmrc,
author= {Marta R. Costa-Juss\`{a} and Rafael E. Banchs},
title= {{Barcelona Media SMT system description for the IWSLT 2009: introducing source context information}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {24-28},
}
 
@inproceedings{iwslt09:EC:dcu,
author= {Yanjun Ma and Tsuyoshi Okita and \"{O}zlem \c{C}etino\u{g}lu and Jinhua Du and Andy Way},
title= {{Low-Resource Machine Translation Using MaTrEx: The DCU Machine Translation System for IWSLT 2009}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {29-36},
}
 
@inproceedings{iwslt09:EC:fbk,
author= {Nicola Bertoldi and Arianna Bisazza and Mauro Cettolo and Germ\'{a};n Sanchis-Trilles and Marcello Federico},
title= {{FBK @ IWSLT-2009}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {37-44},
}
 
@inproceedings{iwslt09:EC:greyc,
author= {Yves Lepage and Adrien Lardilleux and Julien Gosme},
title= {{The GREYC Translation Memory for the IWSLT 2009 Evaluation Campaign: one step beyond translation memory}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {45-49},
}
 
@inproceedings{iwslt09:EC:i2r,
author= {Xiangyu Duan and Deyi Xiong and Hui Zhang and Min Zhang and Haizhou Li},
title= {{I${}^{2}$R's Machine Translation System for IWSLT 2009}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {50-54},
}
 
@inproceedings{iwslt09:EC:ict,
author= {Haitao Mi and Yang Liu and Tian Xia and Xinyan Xiao and Yang Feng and Jun Xie and Hao Xiong and Zhaopeng Tu and Daqi Zheng and Yajuan Lu and Qun Liu},
title= {{The ICT Statistical Machine Translation Systems for the IWSLT 2009}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {55-59},
}
 
@inproceedings{iwslt09:EC:lig,
author= {Fethi Bougares and Laurent Besacier and Herv\'{e} Blanchon},
title= {{LIG approach for IWSLT09 : Using Multiple Morphological Segmenters for Spoken Language Translation of Arabic}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {60-64},
}
 
@inproceedings{iwslt09:EC:lium,
author= {Holger Schwenk and Lo\"{i}c Barrault and Yannick Est\`{e}ve and Patrik Lambert},
title= {{LIUM's Statistical Machine Translation Systems for IWSLT 2009}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {65-70},
}
 
@inproceedings{iwslt09:EC:mit,
author= {Wade Shen and Brian Delaney and Arya Ryan Aminzadeh and Timothy Anderson and Raymond Slyh},
title= {{The MIT-LL/AFRL IWSLT-2009 System}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {71-78},
}
 
@inproceedings{iwslt09:EC:nict,
author= {Masao Utiyama and Hirofumi Yamamoto and Eiichiro Sumita},
title= {{Two methods for stabilizing MERT: NICT at IWSLT 2009}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {79-82},
}
 
@inproceedings{iwslt09:EC:nlpr,
author= {Maoxi Li and Jiajun Zhang and Yu Zhou and Chengqing Zong},
title= {{The CASIA Statistical Machine Translation System for IWSLT 2009}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {83-90},
}
 
@inproceedings{iwslt09:EC:nus,
author= {Preslav Nakov and Chang Liu and Wei Lu and Hwee Tou Ng},
title= {{The NUS Statistical Machine Translation System for IWSLT 2009}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {91-98},
}
 
@inproceedings{iwslt09:EC:tokyo,
author= {Xianchao Wu and Takuya Matsuzaki and Naoaki Okazaki and Yusuke Miyao and Jun'ichi Tsujii},
title= {{The UOT System: Improve String-to-Tree Translation Using Head-Driven Phrasal Structure Grammar and Predicate-Argument Structures}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {99-106},
}
 
@inproceedings{iwslt09:EC:tottori,
author= {Jin'ichi Murakami and Masato Tokuhisa and Satoru Ikehara},
title= {{Statistical Machine Translation adding Pattern-based Machine translation in Chinese-English Translation}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {107-112},
}
 
@inproceedings{iwslt09:EC:tubitak,
author= {{Co\c{s}kun} Mermer and Hamza Kaya and Mehmet U\v{g}ur Do\v{g}an},
title= {{The TUBITAK-UEKAE Statistical Machine Translation System for IWSLT 2009}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {113-117},
}
 
@inproceedings{iwslt09:EC:upv,
author= {Guillem Gasc\'{o} and Joan Andreu S\'{a}nchez},
title= {{UPV Translation System for IWSLT 2009}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {118-123},
}
 
@inproceedings{iwslt09:EC:uw,
author= {Mei Yang and Amittai Axelrod and Kevin Duh and Katrin Kirchhoff},
title= {{The University of Washington Machine Translation System for IWSLT 2009}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {124-128},
}
 
@inproceedings{iwslt09:TP:bisazza,
author= {Arianna Bisazza and Marcello Federico},
title= {{Morphological Pre-Processing for Turkish to English Statistical Machine Translation}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {129-135},
}
 
@inproceedings{iwslt09:TP:cmejrek,
author= {Martin Cmejrek and Bowen Zhou and Bing Xiang},
title= {{Enriching SCFG Rules Directly From Efficient Bilingual Chart Parsing}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {136-143},
}
 
@inproceedings{iwslt09:TP:hayashi,
author= {Katsuhiko Hayashi and Taro Watanabe and Hajime Tsukada and Hideki Isozaki},
title= {{Structural Support Vector Machines for Log-Linear Approach in Statistical Machine Translation}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {144-151},
}
 
@inproceedings{iwslt09:TP:hoang,
author= {Hieu Hoang and Philipp Koehn and Adam Lopez},
title= {{A Unified Framework for Phrase-Based, Hierarchical, and Syntax-Based Statistical Machine Translation}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {152-159},
}
 
@inproceedings{iwslt09:TP:sanchis,
author= {Germ\'{a}n Sanchis-Trilles and Mauro Cettolo and Nicola Bertoldi and Marcello Federico},
title= {{Online Language Model Adaptation for Spoken Dialog Translation}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {160-167},
}
 
@inproceedings{iwslt09:DEMO:nict,
author= {Chiori Hori and Sakriani Sakti and Michael Paul and Noriyuki Kimura and Yutaka Ashikari and Ryosuke Isotani and Eiichiro Sumita and Satoshi Nakamura},
title= {{Network-based Speech-to-Speech Translation}},
year= {2009},
booktitle= {Proc. of the International Workshop on Spoken Language Translation},
address= {Tokyo, Japan},
pages= {168},
}

Venue

| トラックバック(0)
National Museum of Emerging Science and Innovation

2-41, Aomi, Koto-ku, Tokyo, Japan
(Tel) +81-3-3570-9151 (Fax) +81-3-3570-9150
http://www.miraikan.jst.go.jp/en

entrance @ 1F ( floor map)
workshop @ 7F ( floor map)

Access
Tourism Info

Gallery

| トラックバック(0)
IWSLT 2009
December 1-2, 2009
National Museum of Emerging Science and Innovation
Tokyo, Japan


December 1
camera 1


camera 2
December 2
camera 1


camera 2

Organizers

| トラックバック(0)
Organizers
  • Alex Waibel (CMU, USA / UKA, Germany)
  • Marcello Federico (FBK, Italy)
  • Satoshi Nakamura (NICT, Japan)

Chairs

  • Eiichiro Sumita (NICT, Japan; Workshop)
  • Michael Paul (NICT, Japan; Evaluation Campaign)
  • Marcello Federico (FBK, Italy; Technical Paper)

Program Committee

  • Laurent Besacier (LIG, France)
  • Francisco Casacuberta (ITI-UPV, Spain)
  • Boxing Chen (NRC, Canda)
  • Philipp Koehn (Univ. Edinburgh, UK)
  • Philippe Langlais (Univ. Montreal, Canada)
  • Geunbae Lee (Postech, Korea)
  • Yves Lepage (GREYC, France)
  • Haizhou Li (I2R, Singapore)
  • Qun Liu (ICT, China)
  • José B. Mariño (TALP-UPC, Spain)
  • Coskun Mermer (TUBITAK, Turkey)
  • Christof Monz (QMUL, UK)
  • Hermann Ney (RWTH, Germany)
  • Holger Schwenk (LIUM, France)
  • Wade Shen (MIT-LL, USA)
  • Hajime Tsukada (NTT, Japan)
  • Haifeng Wang (TOSHIBA, China)
  • Andy Way (DCU, Ireland)
  • Chengqing Zong (CASIA, China)

Local Arrangements

  • Mari Oku (NICT, Japan)

Supporting Organizations

National Institute of Information and Communications Technology
The Scientific and Technological Research Council of Turkey (TUBITAK) National Research Institute of Electronics and Cryptology (UEKAE)

Contact

| トラックバック(0)

WORKSHOP ORGANIZATION
Eiichiro Sumita
(reverse) email: jp *dot* co *dot* nict *at* sumita *dot* eiichiro

EVALUATION CAMPAIGN
Michael Paul
(reverse) email: jp *dot* go *dot* nict *at* paul *dot* michael

TECHNICAL PAPER
Marcello Federico
(reverse) email: eu *dot* fbk *at* federico

LOCAL ARRANGEMENT
Mari Oku
(reverse) email: jp *dot* go *dot* nict *dot* khn *at* iwsltlocal09


National Institute of Information and Communications Technology (NICT)
Knowledge Creating Communciation Research Center
MASTAR Project

2-2-2 Hikaridai, Keihanna Science City, Kyoto 619-0288, Japan
TEL: +81-774-95-1301
FAX: +81-774-95-1308

References

| トラックバック(0)
Events Co-located with IWSLT 2009

IWSLT Evaluation Campaigns

IWSLT2009: 2009年12月 アーカイブ

2009年12月 1日

Theme

The International Workshop on Spoken Language Translation (IWSLT) is a yearly, open evaluation campaign for spoken language translation followed by a scientific workshop, in which both system descriptions and scientific papers are presented. IWSLT's evaluations are not competition-oriented, but their goal is to foster cooperative work and scientific exchange. In this respect, IWSLT proposes challenging research tasks and an open experimental infrastructure for the scientific community working on spoken and written language translation.

Evaluation Campaign

The 6th International Workshop on Spoken Language Translation will take place in Tokyo, Japan in December 2009. The focus of this year's evaluation campaign will be the translation of task-oriented human dialogs in travel situations. The speech data was recorded through human interpreters, where native speakers of different languages were asked to complete certain travel-related tasks like hotel reservations using their mother-tongue. The translation of the freely-uttered conversation was carried-out by human interpreters. The obtained speech data was annotated with dialog and speaker information. For the Challenge Task, IWSLT participants will have to translate both, the Chinese and English, outputs of the automatic speech recognizers (lattice, N/1BEST) into English and Chinese, respectively.

Like in previous IWSLT events, a standard BTEC Task will be provided. However, the BTEC Task will focus on text input only, i.e. no automatic speech recognizer results (lattice, N/1BEST) have to be translated. In addition to the Arabic-English and Chinese-English translation tasks, this year's evaluation campaign features Turkish as a new input language.

Each participant in the evaluation campaign is requested to submit a paper describing the MT system, the utilized resources, and results using the provided test data. Contrastive run submissions using only the bilingual resources provided by IWSLT as well as investigations in the contribution of each utilized resource are highly appreciated. Moreover, all participants are requested to present their papers at the workshop.

Scientific Paper

In addition to the evaluation campaign, the IWSLT 2009 workshop also invites scientific paper submissions related to spoken language technologies. Possible topics include, but are not limited to:

  • Spoken dialog modeling
  • Integration of ASR and MT
  • SMT, EBMT, RBMT, Hybrid MT
  • MT evaluation
  • Language resources for MT
  • Open source software for MT
  • Pivot-language-based MT
  • Task adaptation and portability in MT

投稿者 mpaul : 10:00| トラックバック

Evaluation Campaign

The evaluation campaign is carried out using BTEC (Basic Travel Expression Corpus), a multilingual speech corpus containing tourism-related sentences similar to those that are usually found in phrasebooks for tourists going abroad. In addition, parts of the SLDB (Spoken Language Databases) corpus, a collection of human-mediated cross-lingual dialogs in travel situations, are provided to the participants of the Challenge Task. Details about the supplied corpora, the data set conditions for each track, the guidelines on how to submit one's translation results, and the evaluation specifications used in this workshop are given below.

Please note that compared to previous IWSLT evaluation campaigns, the guidelines for how to use the language resources for each data track have changed for IWSLT 2009. Starting in 2007, we encouraged everyone to collect out-of-domain language resources and tools that could be shared between the participants. This was very helpful for many participants and allowed many interesting experiments, but had the side-effect of the system outputs being difficult to compare because it was impossible to find out whether certain gains in performance were triggered by better suited (or simply more) language resources (engineering aspects) or by improvements in the underlying decoding algorithms and statistical models (research aspects). After the IWSLT 2008 workshop, many participants asked us to focus on the research aspects for IWSLT 2009.

Therefore, the monolingual and bilingual language resources that should be used to train the translation engines for the primay runs are limited to the supplied corpus for each translation task. This includes all supplied development sets, i.e., you are free to use these data sets as you wish for tuning of model parameters or as training bitext, etc. All other languages resources besides the ones for the given translation task, should be treated as "additional language resources". For examples, any additional dictionaries, word lists, bitext corpora such as the ones provided by LDC. In addition, some participants asked whether they could use the BTEC TE and BTEC AE supplied resources for the BTEC CE task. These should also be treated as "additional resources". Because it is impossible to limit the usage of linguistic tools like word segmentation tools, parsers, etc., those tools are allowed to preprocess the supplied corpus, but we kindly ask participants to describe in detail which tools were applied for data preprocessing in their system description paper.

In order to motivate participants to continue to explore the effects of additional language resources (model adaptation, OOV handling, etc.) we DO ACCEPT contrastive runs based on additional resources. These will be evaluated automatically using the same framework as the primary runs, thus the results will be directly comparable to this year's primary runs and can be published by the participants in the MT system description paper or in a scientific paper. Due to the workshop budget limits, however, it would be difficult to include all contrastive runs into the subjective evaluation. Therefore, we kindly ask the partipants for a contribution if they would like to obtain a human assessment of their contrastive runs as well. If you intend to do so, please contact us as soon as possible, so that we can adjust the evaluation schedule accordingly. Contrastive run results will not appear in the overview paper, but participants are free to report their findings in the MT system description paper or even a separate scientific paper submission.

[Corpus Specifications]

[Translation Input Conditions]

[Evaluation Specifications]



Corpus Specifications

BTEC Training Corpus:
  • data format:
    • each line consists of three fields divided by the character '\'
    • sentence consisting of words divided by single spaces
    • format: <SENTENCE_ID>\01\<MT_TRAINING_SENTENCE>
    • Field_1: sentence ID
    • Field_2: paraphrase ID
    • Field_3: MT training sentence
  • example:
    • TRAIN_00001\01\This is the first training sentence.
    • TRAIN_00002\01\This is the second training sentence.
  • Arabic-English (AE)
  • Chinese-English (CE)
  • Turkish-English (TE)

    • 20K sentences randomly selected from the BTEC corpus
    • coding: UTF-8
    • text is case-sensitive and includes punctuations

BTEC Develop Corpus:

  • text input, reference translations of BTEC sentences
  • data format:
    • each line consists of three fields divided by the character '\'
    • sentence consisting of words divided by single spaces
    • format: <SENTENCE_ID>\<PARAPHRASE_ID>\<TEXT>
    • Field_1: sentence ID
    • Field_2: paraphrase ID
    • Field_3: MT develop sentence / reference translation
  • text input example:
    • DEV_001\01\This is the first develop sentence.
    • DEV_002\01\This is the second develop sentence.
  • reference translation example:
  • DEV_001\01\1st reference translation for 1st input
    DEV_001\02\2nd reference translation for 1st input
    ...
    DEV_002\01\1st reference translation for 2nd input
    DEV_002\02\2nd reference translation for 2nd input
    ...
  • Arabic-English
    • CSTAR03 testset: 506 sentences, 16 reference translations
    • IWSLT04 testset: 500 sentences, 16 reference translations
    • IWSLT05 testset: 506 sentences, 16 reference translations
    • IWSLT07 testset: 489 sentences, 6 reference translations
    • IWSLT08 testset: 507 sentences, 16 reference translations

  • Chinese-English
    • CSTAR03 testset: 506 sentences, 16 reference translations
    • IWSLT04 testset: 500 sentences, 16 reference translations
    • IWSLT05 testset: 506 sentences, 16 reference translations
    • IWSLT07 testset: 489 sentences, 6 reference translations
    • IWSLT08 testset: 507 sentences, 16 reference translations

  • Turkish-English
    • CSTAR03 testset: 506 sentences, 16 reference translations
    • IWSLT04 testset: 500 sentences, 16 reference translations

BTEC Test Corpus:

  • Arabic-English
  • Chinese-English
  • Turkish-English
    • 470 unseen sentences of the BTEC evaluation corpus
    • coding: → see BTEC Develop Corpus
    • data format: → see BTEC Develop Corpus


CHALLENGE Training Corpus:
  • TXT data format:
    • each line consists of three fields divided by the character '\'
    • sentence consisting of words divided by single spaces
    • format: <SENTENCE_ID>\01\<MT_TRAINING_SENTENCE>
    • Field_1: dialog ID
    • Field_2: sentence ID
    • Field_3: MT training sentence
    • example:
    • TRAIN_00001\This is the first training sentence.
    • TRAIN_00002\This is the second training sentence.
    • ...
  • INFO data format:
    • each line consists of three fields divided by the character '\'
    • sentence consisting of words divided by single spaces
    • format: <SENTENCE_ID>\01\<SPEAKER_TAG>
    • Field_1: dialog ID
    • Field_2: sentence ID
    • Field_3: speaker annotations ('a': agent, 'c': customer, 'i': interpreter)
    • example:
    • train_dialog01\01\a
    • train_dialog01\02\i
    • train_dialog01\03\a
    • ...
    • train_dialog398\20\i
    • train_dialog398\21\i
    • train_dialog398\22\c
  • Chinese-English (CE)
  • English-Chinese (EC)

    • 394 dialogs, 10K sentences from the SLDB corpus
    • coding: UTF-8
    • word segmentations according to ASR output segmentation
    • text is case-sensitive and includes punctuations

CHALLENGE Develop Corpus:

  • ASR output (lattice, NBEST, 1BEST), correct recognition result transcripts (text), reference translations of SLDB dialogs
  • data format:
    • 1-BEST
      • each line consists of three fields divided by the character '\'
      • sentence consisting of words divided by single spaces
      • format: <SENTENCE_ID>\01\<RECOGNITION_HYPOTHESIS>
      • Field_1: sentence ID
      • Field_2: paraphrase ID
      • Field_3: best recognition hypothesis
      • example (input):
      • IWSLT09_CT.devset_dialog01_02\01\best ASR hypothesis for 1st utterance
        IWSLT09_CT.devset_dialog01_04\01\best ASR hypothesis for 2nd utterance
        IWSLT09_CT.devset_dialog01_06\01\best ASR hypothesis for 3rd utterance
        ...
    • N-BEST
      • each line consists of three fields divided by the character '\'
      • sentence consisting of words divided by single spaces
      • format: <SENTENCE_ID>\01\<RECOGNITION_HYPOTHESIS>
      • Field_1: sentence ID
      • Field_2: NBEST ID (max: 20)
      • Field_3: recognition hypothesis
      • example (input):
      • IWSLT09_CT.devset_dialog01_02\01\best ASR hypothesis for 1st utterance
        IWSLT09_CT.devset_dialog01_02\02\2nd-best ASR hypothesis for 1st utterance
        ...
        IWSLT09_CT.devset_dialog01_02\20\20th-best ASR hypothesis for 1st utterance
        IWSLT09_CT.devset_dialog01_04\01\best ASR hypothesis for 2nd utterance
        ...
    • reference translations
      • each line consists of three fields divided by the character '\'
      • sentence consisting of words divided by single spaces
      • format: <SENTENCE_ID>\lt;PARAPHRASE_ID>\<REFERENCE>
      • Field_1: sentence ID
      • Field_2: paraphrase ID
      • Field_3: reference translation
      • example:
      • IWSLT09_CT.devset_dialog01_02\01\1st reference translation for 1st input
        IWSLT09_CT.devset_dialog01_02\02\2nd reference translation for 1st input
        ...
        IWSLT09_CT.devset_dialog01_04\01\1st reference translation for 2nd input
        IWSLT09_CT.devset_dialog01_04\02\2nd reference translation for 2nd input
        ...
  • Chinese-English
    • IWSLT05 testset: 506 sentences, 16 reference translations (read speech)
    • IWSLT06 devset: 489 sentences, 16 reference translations (read speech, spontaneous speech)
    • IWSLT06 testset: 500 sentences, 16 reference translations (read speech, spontaneous speech)
    • IWSLT08 devset: 245 sentences, 7 reference translations (spontaneous speech)
    • IWSLT08 testset: 506 sentences, 7 reference translations (spontaneous speech)
    • IWSLT09 devset: 10 dialogs, 200 sentences, 4 reference translations (spontaneous speech)
  • English-Chinese
    • IWSLT05 testset: 506 sentences, 16 reference translations (read speech)
    • IWSLT08 devset: 245 sentences, 7 reference translations (spontaneous speech)
    • IWSLT08 testset: 506 sentences, 7 reference translations (spontaneous speech)
    • IWSLT09 devset: 10 dialogs, 210 sentences, 4 reference translations (spontaneous speech)

CHALLENGE Test Corpus:

  • Chinese-English
    • 27 dialogs, 405 sentences
    • coding: → see CHALLENGE Develop Corpus
    • TXT data format: → see CHALLENGE Develop Corpus
    • INFO data format: → see CHALLENGE Training Corpus
  • English-Chinese
    • 27 dialogs, 393 sentences
    • coding: → see CHALLENGE Develop Corpus
    • TXT data format: → see CHALLENGE Training Corpus
    • INFO data format: → see CHALLENGE Training Corpus


Translation Input Conditions

Spontaneous Speech

  • Challenge Task
    • Chinese-English
    • English-Chinese
→ ASR output (word lattice, N-best, 1-best) of ASR engines provided by IWSLT organizers

Correct Recognition Results

  • Challenge Task
    • Chinese-English
    • English-Chinese
  • BTEC Task
    • Arabic-English
    • Chinese-English
    • Turkish-English
→ text input

Evaluation

Subjective Evaluation:

  • Metrics:
    • ranking
      (= official evaluation metrics to order MT system scores)
      → all primary run submissions
    • fluency/adequacy
      → top-ranked primary run submission
    • dialog adequacy
      (= adequacy judgments in the context of the given dialog)
      → top-ranked primary run submission
  • Evaluators:
    • 3 graders per translation

Automatic Evaluation:

  • Metrics:
    • BLEU/NIST (NIST v13)
    • → bug fixes to handle empty translations and IWSLT supplied corpus can be found here.
    → up to 7 reference translations
    → all run submissions
  • Evaluation Specifications:
    • case+punc:
      • case sensitive
      • with punctuation marks tokenized
    • no_case+no_punc:
      • case insensitive (lower-case only)
      • no punctuation marks
  • Data Processing Prior to Evaluation:
    • English MT Output:
      • simple tokenization of punctuations (see 'tools/ppEnglish.case+punc.pl' script)
    • Chinese MT Output:
      • segmentation into characters (see 'tools/splitUTF8Characters' script)
  • 投稿者 mpaul : 09:00| トラックバック

    Important Dates

    Evaluation Campaign

    Event Date
    Training Corpus Release June 19, 2009
    Test Corpus Release Aug 14, 2009
    Run Submission Due Aug 28, 2009
    Result Feedback to Participants September 11, 2009
    MT System Descriptions Due September 18, 2009
    Notification of Acceptance October 16, 2009
    Camera-ready Paper Due October 31, 2009
    Workshop December 1 - 2, 2009

    Technical Papers

    Event Date
    Paper Submission Due August 21, 2009
    Notification of Acceptance October 9, 2009
    Camera-ready Paper Due October 31, 2009
    Workshop December 1 - 2, 2009

    投稿者 mpaul : 08:00| トラックバック

    Downloads


    IWSLT 2009 Corpus Release (for IWSLT 2009 participants only)

    User License Agreement DOC, PDF
    Download
      CHALLENGE Task   BTEC Task

    In order to get access to the corpus, please follow the procedure below. Access will be enabled AFTER we received your original signed user license agreement.

    1. download the post-workshop user license agreement (click on DOC/PDF link above), sign it, and send it two copies to:

      Michael Paul
      National Institute of Information and Communications Technology
      Knowledge Creating Communciation Research Center
      MASTAR Project
      Language Translation Group
      3-5 Hikaridai, "Keihanna Science City"
      Kyoto 619-0289, Japan

    2. download the corpus files using the ID and Password you obtained for the download of the training data files for IWSLT 2009.


    Corpus Data Files

    - train:
    • (BTEC) 20K sentence pairs of translation examples with case and punctuation information segmented according to utilized ASR engine
    • (CHALLENGE) in addition to BTEC@train, 10K sentence pairs of translation examples with dialog annotations
    - dev:
    • up to 6 evaluation data sets containing 500 source language sentences with multiple references and ASR output data files (= testsets of previous IWSLT evaluation campaigns)
    - test:
    • 500 source language sentences and ASR output data files (= input of run-submissions of this years evaluation campaign)
    - tools:
    • preprocessing scripts (tokenization, NBEST extraction, etc.) used to prepare the data sets

    For data set details, click on the translation direction name tag.


    CHALLENGE
    Chinese-English
    train dev test tools
    TGZ TGZ TGZ TGZ
    English-Chinese
    train dev test tools
    TGZ TGZ TGZ TGZ

    BTEC
    Arabic-to-English
    train dev test tools
    TGZ TGZ TGZ TGZ
    Chinese-to-English
    train dev test tools
    TGZ TGZ TGZ TGZ
    Turkish-to-English
    train dev test tools
    TGZ TGZ TGZ TGZ



    Templates for LaTeX/MSWord

    - Gzipped TAR archive (all template files): latex_template_iwslt09.tgz

    - LaTeX style: iwslt09.sty
    - Example document: template.tex
    - Example document PS: template.ps
    - Example document PDF: template.pdf
    - Bibliography style: IEEEtran.bst
    - MS-Word template: template.doc

    投稿者 mpaul : 06:00| トラックバック

    Submission

    Submissions of Technical Papers and MT System Descriptions must be done electronically in PDF format using the above links. The style-files and templates are available at the download page. Authors are strongly encouraged to use the provided LaTeX style files or MS-Word equivalents. Submissions should follow the "Paper Submission Format Guidelines" listed below.

    Paper Submission Format Guidelines
    The format of each paper submission (evaluation campaign and technical paper) should agree with the "Camera-Ready Paper Format Guidelines" listed below.

    Camera-Ready Paper Format Guidelines
    • PDF file format
    • Maximum eight (8) pages (Standard A4 size: 210 mm by 297 mm preferred)
    • Single-spaced
    • Two (2) columns
    • Printed in black ink on white paper and check that the positioning (left and top margins) as well as other layout features are correct.
    • No smaller than nine (9) point type font throughout the paper, including figure captions.
    • To achieve the best viewing experience for the Proceedings, we strongly encourage to use Times-Roman font (the LaTeX style file as well as the Word template files use Times-Roman). This is needed in order to give the Proceedings a uniform look.
    • Do NOT include headers and footers. The page numbers and conference identification will be post processed automatically, at the time of printing the Proceedings.
    • The first page should have the paper title, author(s), and affiliation(s) centered on the page across both columns. The remainder of the text must be in the two-column format, staying within the indicated image area.
    • Follow the style of the sample paper that is included with regard to title, authors, affiliations, abstract, heading, and subheadings.
    Paper Title The paper title must be in boldface. All non-function words must be capitalized, and all other words in the title must be lower case. The paper title is centered across the top of the two columns on the first page as indicated above
    Authors' Name(s) The authors' name(s) and affiliation(s) appear centered below the paper title. If space permits, include a mailing address here. The templates indicate the area where the title and author information should go. These items need not be strictly confined to the number of lines indicated; papers with multiple authors and affiliations, for example, may require two or more lines for this information.
    Abstract Each paper must contain an abstract that appears at the beginning of the paper.
    Major Headings Major headings are in boldface, with the first word capitalized and the rest of the heading in lower case. Examples of the various levels of headings are included in the templates.
    Sub Headings Sub headings appear like major headings, except they start at the left margin in the column.
    Sub-Sub Headings Sub-sub headings appear like sub headings, except they are in italics and not bold face.
    References Number and list all references at the end of the paper. The references are numbered in order of appearance in the document. When referring to them in the text, type the corresponding reference number in square brackets as shown at the end of this sentence [1]. (This is done automatically when using the Latex template).
    Illustrations Illustrations must appear within the designated margins, and must be positioned within the paper margins. They may span the two columns. If possible, position illustrations at the top of columns, rather than in the middle or at the bottom. Caption and number every illustration. All half-tone or color illustrations must be clear when printed in black and white.


    Templates
    If your paper will be typeset using LaTeX, please download the template package here that will generate the proper format. To extract files under UNIX run: $ unzip latex_template_iwslt09.tgz
    Paper Status
    After submission, each paper will be given a unique Paper ID and a password. This will be shown on the confirmation page right after submission of the documents and a confirmation email including the Paper ID will be sent to the author of the paper as well. It will be possible to check and correct (if necessary) the submitted paper information (names, affiliations, etc.). Corrections/uploads can be made up to the respective submission deadline.

    Paper Acceptance/Rejection Information
    Each corresponding author will be notified by e-mail of acceptance/rejection. Reviewer feedback will also be available for each paper.

    投稿者 mpaul : 05:00| トラックバック

    Run Submission Guidelines


    BTEC Translation Task (BTEC_AE, BTEC_CE, BTEC_TE)

    data format:

    • same format as the DEVELOP data sets.
    For details, refer to the respective README files:
    + IWSLT/2009/corpus/BTEC/Arabic-English/README.BTEC_AE.txt
    + IWSLT/2009/corpus/BTEC/Chinese-English/README.BTEC_CE.txt
    + IWSLT/2009/corpus/BTEC/Turkish-English/README.BTEC_TE.txt
    • input text is case-sensitive and contains punctuations
    • English MT output should:
      • be in the same format as the input file (<SentenceID>\01\MT_output_text)
      • be case-sensitive, with punctuations
      • contain the same amount of lines (=sentences) as the input file
    Example:
         TEST_IWSLT09_001\01\This is the E translation of the 1st sentence.
         TEST_IWSLT09_002\01\This is the E translation of the 2nd sentence.
         TEST_IWSLT09_003\01\
         TEST_IWSLT09_004\01\The previous input (ID=003) could not be translated, thus the translation is empty!
         TEST_IWSLT09_005\01\...
         ...
         TEST_IWSLT09_469\01\This is the E translation of the last sentence.

    run submission format:
    • each participant has to translate and submit at least one translation of the given input files for each of the translation task they registered for.
    • multiple run submissions are allowed, but participants have to explicitly indicate one PRIMARY run that will be used for human assessments. All other run submissions are treated as CONTRASTIVE runs. In case that none of the runs is marked as PRIMARY, the latest submission (according to the file time-stamp) will be used for the subjective evaluation.
    • runs have to be submitted as a gzipped TAR archive (format see below) and send as an email attachement to "Michael Paul" (michael.paul@nict.go.jp).
    TAR archive file structure:
    <UserID>/<TranslationTask>.<UserID>.primary.txt
            /<TranslationTask>.<UserID>.contrastive1.txt
            /<TranslationTask>.<UserID>.contrastive2txt
            /...
        where: <UserID> = user ID of participant used to download data files
               <TranslationTask> = BTEC_AE | BTEC_CE | BTEC_TE

    Examples:
    nict/BTEC_AE.nict.primary.txt
       /BTEC_CE.nict.primary.txt
       /BTEC_CE.nict.contrastive1.txt
       /BTEC_CE.nict.contrastive2.txt
       /BTEC_CE.nict.contrastive3.txt
       /BTEC_TE.nict.primary.txt
       /BTEC_TE.nict.contrastive1.txt      
    • re-submitting your runs is allowed as far as the mails arrive BEFORE the submission deadline. In case that multiple TAR archives are submitted by the same participant, only the runs of the most recent submission mail will be used for the IWSLT 2009 evaluation and previous mails will be ignored.

    CHALLENGE Translation Task (CT_CE, CT_EC)

    data format:

    • same format as the DEVELOP data sets.
    For details, refer to the respective README files:
    + IWSLT/2009/corpus/CHALLENGE/Chinese-English/
      README.CT_CE.txt
    + IWSLT/2009/corpus/CHALLENGE/English-Chinese/
      README.CT_EC.txt
    • the input data sets are created from the speech recognition results (ASR output) and therefore are CASE-INSENSITIVE and does NOT contain punctuations
    • the input data sets of the CHALLENGE tasks are separated according to the source language:
       + Chinese input data:
           IWSLT/2009/corpus/CHALLENGE/Chinese-English/test
       + English input data:
           IWSLT/2009/corpus/CHALLENGE/English-Chinese/test
    The dialog structure is reflected in the respective sentence ID.
    Example:
    (dialog structure)

      IWSLT09_CT.testset_dialog01_01\01\...1st English utterance...
      IWSLT09_CT.testset_dialog01_02\01\...1st Chinese utterance...
      IWSLT09_CT.testset_dialog01_03\01\...2nd English utterance...
      IWSLT09_CT.testset_dialog01_04\01\...2nd Chinese utterance...
      IWSLT09_CT.testset_dialog01_05\01\...3rd Chinese utterance...
      IWSLT09_CT.testset_dialog01_06\01\...3rd English utterance...
      ...
    (English input data to be translated into Chinese)
        + IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/TXT/
          IWSLT09_CT.testset.en.txt

          IWSLT09_CT.testset_dialog01_01\01\...1st English utterance...
          IWSLT09_CT.testset_dialog01_03\01\...2nd English utterance...
          IWSLT09_CT.testset_dialog01_06\01\...3rd English utterance...
          ...
    (Chinese input data to be translated into English)
        + IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/TXT/
          IWSLT09_CT.testset.zh.txt

          IWSLT09_CT.testset_dialog01_02\01\...1st Chinese utterance...
          IWSLT09_CT.testset_dialog01_04\01\...2nd Chinese utterance...
          IWSLT09_CT.testset_dialog01_05\01\...3rd Chinese utterance...
          ...
    • English MT output should:
         + be in the same format as the Chinese input file
           (<SentenceID>\01\MT_output_text)
         + be case-sensitive, with punctuations
         + contain the same amount of lines (=sentences) as the Chinese input file
    Example:
        + nict/CT_CE.nict.primary.txt
          IWSLT09_CT.testset_dialog01_02\01\...E translation of 1st Chinese utterance...
          IWSLT09_CT.testset_dialog01_04\01\...E translation of 2nd Chinese utterance...
          IWSLT09_CT.testset_dialog01_05\01\...E translation of 3rd Chinese utterance...
          ...
    • Chinese MT output should:
         + be in the same format as the English input file (<SentenceID>\01\MT_output_text)
         + be case-sensitive, with punctuations
         + contain the same amount of lines (=sentences) as the English input file
    Example:
        + nict/CT_EC.nict.primary.txt
          IWSLT09_CT.testset_dialog01_01\01\...C translation of 1st English utterance...
          IWSLT09_CT.testset_dialog01_03\01\...C translation of 2nd English utterance...
          IWSLT09_CT.testset_dialog01_06\01\...C translation of 3rd English utterance...
          ...

    run submission format:
    • each participant registered for the Challenge Task has to translate both translation directions (English-Chinese AND Chinese-English) and submit a total of 4 MT output files per run:
        + translations of 2 input data conditions (CRR, ASR) for Chinese-English AND
        + translations of 2 input data conditions (CRR, ASR) for English-Chinese.

    (1) the correct recognition result (CRR) data files, i.e., the human transcriptions of the Challenge Task data files that do not include recognition errors:
             CE: IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/TXT/
                 IWSLT09_CT.testset.zh.txt
             EC: IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/TXT/
                 IWSLT09_CT.testset.en.txt

    (2) the speech recognition output (ASR output, with recognition errors), whereby the participants are free to choose any of the following three ASR output data types as the input of their MT system:
            (a) word lattices:
                CE: IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
                    SLF/testset/*.zh.SLF
                EC: IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
                    SLF/testset/*.en.SLF

            (b) NBEST hypotheses:
                CE: IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
                    NBEST/IWSLT09.testset.zh.20BEST.txt
                    or
                    IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
                    NBEST/testset/*.zh.20BEST.txt

                EC: IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
                    NBEST/IWSLT09.testset.en.20BEST.txt
                    or
                    IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
                    NBEST/testset/*.en.20BEST.txt
    [NOTE] larger NBEST list can be generated from the lattice data files using the following tools:
                  + IWSLT/2009/corpus/CHALLENGE/Chinese-English/tools/
                    extract_NBEST.zh.CT_CE.testset.sh
                  + IWSLT/2009/corpus/CHALLENGE/English-Chinese/tools/
                    extract_NBEST.en.CT_EC.testset.sh

            (c) 1BEST hypotheses:
                CE: IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
                    1BEST/IWSLT09.testset.zh.1BEST.txt
                    or               
                    IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
                    1BEST/testset/*.zh.1BEST.txt
                EC: IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
                    1BEST/IWSLT09.testset.en.1BEST.txt
                    or
                    IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
                    1BEST/testset/*.en.1BEST.txt
    [NOTE] submissions containing only the results for one translation direction will be excluded from the subjective evaluation for IWSLT 2009.
    • multiple run submissions are allowed, but participants have to explicitly indicate one PRIMARY run that will be used for human assessments. All other run submissions are treated as CONTRASTIVE runs. In case that none of the runs is marked as PRIMARY, the latest submission (according to the file time-stamp) will be used for the subjective evaluation.
    • runs have to be submitted as a gzipped TAR archive (format see below) and send as an email attachement to "Michael Paul" (michael.paul@nict.go.jp).
    TAR archive file structure:
    <UserID>/CT_CE.<UserID>.primary.CRR.txt
            /CT_CE.<UserID>.primary.ASR.<CONDITION>.txt
            /CT_EC.<UserID>.primary.CRR.txt
            /CT_EC.<UserID>.primary.ASR.<CONDITION>.txt
            /...
    where: <UserID> = user ID of participant used to download data files
          <CONDITION> = SLF | <NUM>
          <NUM> = number of recognition hypotheses used for translation, e.g.,
                    '1'  - 1-best recognition result
                    '20' - 20-best hypotheses list
    Examples:
        nict/CT_CE.nict.primary.CRR.txt
            /CT_CE.nict.primary.ASR.SLF.txt
            /CT_EC.nict.primary.CRR.txt
            /CT_EC.nict.primary.ASR.SLF.txt

            /CT_CE.nict.contrastive1.CRR.txt
            /CT_CE.nict.contrastive1.ASR.1.txt
            /CT_EC.nict.contrastive1.CRR.txt
            /CT_EC.nict.contrastive1.ASR.1.txt

            /CT_CE.nict.contrastive2.CRR.txt
            /CT_CE.nict.contrastive2.ASR.20.txt
            /CT_EC.nict.contrastive2.CRR.txt
            /CT_EC.nict.contrastive2.ASR.20.txt      
    • re-submitting your runs is allowed as far as the mails arrive BEFORE the submission deadline. In case that multiple TAR archives are submitted by the same participant, only the runs of the most recent submission mail will be used for the IWSLT 2009 evaluation and previous mails will be ignored.

    投稿者 mpaul : 04:45| トラックバック

    Automatic Evaluation Server

    We prepared an online evaluation server that allows you to conduct additional experiments to confirm the effectiveness of innovative methods and features within the IWSLT 2009 evaluation framework. You can submit translation hypotyhesis files for any of the IWSLT 2009 translation tasks. The hypothesis file format is the same as for the official run submissions.

    Before you can submit runs, you have to register a UserID/PassID. After login, click on "Make a new Submission", select the "Translation Direction" and "Training Data Condition" you used to generate the hypothesis file, upload the hypothesis file, specify a system ID and a short description that allows you to easily identify the run submission, and press "Calculate Scores".

    The server will sequentially calculate automatic scores for BLEU/NIST, WER/PER/TER, and METEOR/F1/PREC/RECL and GTM. Finally, the automatic scoring results will be send to you via email. In addition, you can access the "Submission Log" which keeps track on all your run submissions. For details on a specific run, please click on the respective "Date". The scoring results of the "case+punc" evaluation specifications (case-sensitive, with punctuations) are displayed in bold-face and the scoring results of the "no_case+no_punc" evaluation specifications (case-insensitive, without punctuations) are displayed in brackets.

    Registration


    Workshop

    The registration for the IWSLT 2009 workshop is now open. Please access the registration server and fill out the registration form.

    Registration Fees Deadline Payment
    Regular Student
    Early JPY 20,000 JPY 15,000 Nov 20, 2009 online
    Late JPY 25,000 JPY 20,000 Nov 30, 2009 on-the-door
    (registration desk)
    On-site JPY 30,000 JPY 25,000 Dec 1 - 2, 2009 on-the-door
    (registration desk)

    The registration fees include: daily lunch, coffee breaks, USB-stick version of the proceedings, participation in all session, and a banquet dinner on December 1. Please note, that the registration fee is not refundable under any circumstances.

    Online payment is only possible until the extended Early Registration Deadline. During the Late Registration period, you have to input the registration form online, but the payment has to be done on the door at the workshop registration desk (7F Miraikan).

    Concerning on-the-door payments, only cash payments can be accepted.

    If you need a visa for coming to Japan, please contact the IWSLT Secretariat (iwslt@the-convention.co.jp) as soon as possible, but not later than October 2nd, 2009.

    If you don't know whether your need a visa or not, please check here.


    投稿者 mpaul : 04:30| トラックバック

    Accommodations

    Hotel rates differ from day to day.
    Please check-out the links below or contact the hotel directly.


    ホテル グランパシフィック LE DAIBA
    (Grand Pacific Le Daiba)

    [map] [reservations]
    「〒135-8701 東京都港区台場2-6-1」
    Daiba 2-6-1, Daiba, Minato-ku, Tokyo 135-8701, Japan
    (Tel) +81-3-5500-6711 (Fax) +81-3-5500-4507
    http://www.grandpacific.jp/eng (English)
    http://www.grandpacific.jp (Japanese)
    19,000 JPY〜 (1 person, 1 night)
    (U07) Daiba station [map]
    ←1min,180JPY→ (U08) Fune-no-kagakukan station [venue]

    三井ガーデンホテル汐留イタリア街
    (Mitsui Garden Hotel Shiodome Italia-gai)

    [map] [reservations]
    「〒105-0021 東京都港区東新橋2-14-24」
    2-14-24, Higashi-shimbashi, Minato-ku, Tokyo 105-0021, Japan
    (Tel) +81-3-3431-1131 (Fax) +81-3-3431-2431
    http://www.gardenhotels.co.jp/eng/shiodome.html (English)
    http://www.gardenhotels.co.jp/shiodome/index.html (Japanese)
    10,300 JPY〜 (1 person, 1 night)
    (U02) Shiodome station [map]
    ←15min,310JPY→ (U08) Fune-no-kagakukan station [venue]

    ホテルヴィラ フォンテーヌ汐留
    (Hotel Villa Fontaine Shiodome)

    [map] [reservations]
    「〒105-0021 東京都港区東新橋1-9-2」
    1-9-2 Higashi-shinbashi Minato-ku, Tokyo 105-0021, Japan
    (Tel) +81-3-3569-2220 (Fax) +81-3-3569-2111
    http://www.hvf.jp/eng/shiodome.php (English)
    http://www.hvf.jp/shiodome (Japanese)
    http://www.hvf.jp/chi/shiodome.html (Chinese)
    10,000 JPY〜 (1 person, 1 night)
    (U02) Shiodome station [map]
    ←15min,310JPY→ (U08) Fune-no-kagakukan station [venue]

    ホテル日航東京
    (Hotel Nikko Tokyo)

    [map] [reservations]
    「〒135-8625 東京都港区台場1丁目9番1号」
    1-9-1 Daiba, Minato-ku, Tokyo 135-8625, Japan
    (Tel) +81-3-5500-5500 (Fax) +81-3-5500-2525
    http://www.hnt.co.jp/en/index.html (English)
    http://www.hnt.co.jp/ (Japanese)
    http://www.jalhotels.com/cn/domestic/kanto/index.html#tokyo (Chinese)
    9,500 JPY〜 (1 person, 1 night)
    (U07) Daiba station [map]
    ←1min,180JPY→ (U08) Fune-no-kagakukan station [venue]

    ホテルトラスティ東京ベイサイド
    (Hotel Trusty Tokyo Bayside)

    [map] [reservations]
    「〒135-0063 東京都江東区有明3-1-5」
    3-1-5 Ariake, Koto-ki, Tokyo 135-0063, Japan
    (Tel) +81-3-6700-0001 (Fax) +81-3-6700-0007
    http://www.trusty.jp/tokyobayside/pdf/tokyobayside_e.pdf (English)
    http://www.trusty.jp/tokyobayside (Japanese)
    6,700 JPY〜 (1 person, 1 night)
    (U11) Kokusai-tenjijou-seimon station [map]
    ←6min,240JPY→ (U08) Fune-no-kagakukan station [venue]

    ホテルサンルート有明
    (Hotel Sunroute Ariake)

    [map] [reservations]
    「〒135-0063 東京都江東区有明3-1-20」
    3-1-20 Ariake, Koutou-Ku, Tokyo 135-0063, Japan
    (Tel) +81-3-5530-3610
    http://www.sunroutehotel.jp/hari-eng/index.asp (English)
    http://www.sunroutehotel.jp/ariake/ (Japanese)
    http://www.sunroutehotel.jp/hari-chi/index.asp (Chinese)
    6,500 JPY〜 (1 person, 1 night)
    (U11) Kokusai-tenjijou-seimon station [map]
    ←6min,240JPY→ (U08) Fune-no-kagakukan station [venue]

    東京ベイ有明ワシントンホテル
    (Tokyo Bay Ariake Washington Hotel)

    [map] [reservations]
    「〒135-0063 東京都江東区有明3-1-28」
    3-1-28 Ariake, Koto-ku, Tokyo, Japan 135-0063, Japan
    (Tel) +81-3-5564-0111 (Fax) +81-3-5564-0525
    http://www.wh-rsv.com/english/tokyo_bay_ariake (English)
    http://www.wh-rsv.com/wh/hotels/ariake/index.html (Japanese)
    http://www.wh-rsv.com/chinese/tokyo_bay_ariake (Chinese)
    6,000 JPY〜 (1 person, 1 night)
    (U12) Ariake station [map]
    ←7min,240JPY→ (U08) Fune-no-kagakukan station [venue]

    投稿者 mpaul : 04:25| トラックバック

    Program


    December 1, 2009

    09:00 09:30 workshop registration

    Workshop Opening
    09:30 09:40 Welcome Remarks
    Satoshi NAKAMURA (NICT, Japan)
    Evaluation Campaign: "Overview Talk"
    09:40 10:10 Overview of the IWSLT 2009 Evaluation Campaign
    Michael PAUL (NICT, Japan)
    coffee break
    Evaluation Campaign: "Challenge Task"
    10:30 11:00 Two methods for stabilizing MERT: NICT at IWSLT 2009
    Masao UTIYAMA, Hirofumi YAMAMOTO, Eiichiro SUMITA (NICT, Japan)
    11:00 11:30 Low-Resource Machine Translation Using MaTrEx: The DCU Machine Translation System for IWSLT 2009
    Yanjun MA, Tsuyoshi OKITA, Özlem ÇETINOGLU, Jinhua DU, Andy WAY (Dublin City University, Ireland)
    11:30 12:00 The CASIA Statistical Machine Translation System for IWSLT 2009
    Maoxi LI, Jiajun ZHANG , Yu ZHOU, Chengqing ZONG (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences; China)
    lunch break
    Invited Talk
    13:00 14:00 Human Translation and Machine Translation
    Philipp KOEHN (University of Edinburgh, UK)
    Technical Paper: "Oral I"
    14:00 14:30 Morphological Pre-Processing for Turkish to English Statistical Machine Translation
    Arianna BISAZZA, Marcello FEDERICO (FBK-irst, Italy)
    14:30 15:00 Enriching SCFG Rules Directly From Efficient Bilingual Chart Parsing
    Martin CMEJREK, Bowen ZHOU, Bing XIANG (IBM, USA)
    15:00 15:30 A Unified Framework for Phrase-Based, Hierarchical, and Syntax-Based Statistical Machine Translation
    Hieu HOANG, Philipp KOEHN, Adam LOPEZ (Univ. Edinburgh, UK)
    coffee break
    Evaluation Campaign: "Poster I"
    15:50 16:50 The TÜBITAK-UEKAE Statistical Machine Translation System for IWSLT 2009
    Coskun MERMER, Hamza KAYA, Mehmet Ugur DOGAN (TÜBITAK-UEKAE, Turkey)
    15:50 16:50 The UOT System: Improve String-to-Tree Translation Using Head-Driven Phrasal Structure Grammar and Predicate-Argument Structures
    Xianchao WU, Takuya MATSUZAKI, Naoaki OKAZAKI, Yusuke MIYAO, Jun'ichi TSUJII (University of Tokyo, Japan)
    15:50 16:50 The GREYC Translation Memory for the IWSLT2009 Evaluation Campaign: one step beyond translation memory
    Yves LEPAGE, Adrien LARDILLEUX, Julien GOSME (University of Caen, France)
    15:50 16:50 The ICT Statistical Machine Translation Systems for the IWSLT 2009
    Haitao MI, Yang LIU, Tian XIA, Xinyan XIAO, Yang FENG, Jun XIE, Hao XIONG, Zhaopeng TU, Daqi ZHENG, Yanjuan LU, Qun LIU (Institute of Computing Technology, Chinese Academy of Sciences; China)
    15:50 16:50 The University of Washington Machine Translation System for IWSLT 2009
    Mei YANG, Amittai AXELROD, Kevin DUH, Katrin KIRCHHOFF (University of Washington, USA)
    15:50 16:50 Statistical Machine Translation adding Pattern based Machine translation in Chinese-English Translation
    Jin'ichi MURAKAMI, Masato TOKUHISA, Satoru IKEHARA (Tottori University, Japan)
    Demo Session
    16:50 17:20 Network-based Speech-to-Speech Translation
    Chiori HORI, Sakriani SAKTI, Michael PAUL, Satoshi NAKAMURA (NICT, Japan)
    Banquet
    18:00 20:00 Restaurant "LA TERRE" (Miraikan, 7F)

    December 2, 2009

    09:00 09:30 workshop registration

    Invited Talk
    09:30 10:30 Two-way Speech-to-Speech Translation for Communicating Across Language Barriers
    Premkumar NATARAJAN (BBN Technologies, USA)
    coffee break
    Technical Paper: "Oral II"
    10:50 11:20 Structural Support Vector Machines for Log-Linear Approach in Statistical Machine Translation
    Katsuhiko HAYASHI (University of Doshisha, Japan); Taro WATANABE, Hajime TSUKADA and Hideki ISOZAKI (NTT, Japan)
    11:20 11:50 Online Language Model Adaptation for Spoken Dialog Translation
    Germán SANCHIS-TRILLES (Universitat Politècnica de València, Spain); Mauro CETTOLO, Nicola BERTOLDI, Marcello FEDERICO (FBK-irst, Italy)
    lunch break
    Invited Talk
    13:00 14:00 Monolingual Knowledge Acquisition and a Multilingual Information Environment
    Kentaro TORISAWA (NICT, Japan)
    Evaluation Campaign: "Poster II"
    14:00 15:00 AppTek Turkish-English Machine Translation System Description for IWSLT 2009
    Selçuk KÖPRÜ (Apptek Inc., Turkey)
    14:00 15:00 LIG approach for IWSLT09 : Using Multiple Morphological Segmenters for Spoken Language Translation of Arabic
    Fethi BOUGARES, Laurent BESACIER, Hervé BLANCHON (LIG, France)
    14:00 15:00 Barcelona Media SMT system description for the IWSLT 2009: introducing source context information
    Marta R. COSTA-JUSSA, Rafael E. BANCHS (Barcelona Media, Spain)
    14:00 15:00 FBK @ IWSLT-2009
    Nicola BERTOLDI, Arianna BISAZZA, Mauro CETTOLO, Marcello FEDERICO (FBK-irst, Italy); Germán SANCHIS-TRILLES (Universitat Politècnica de València, Spain)
    14:00 15:00 LIUM's Statistical Machine Translation Systems for IWSLT 2009
    Holger SCHWENK, Loïc BARRAULT, Yannick ESTÈVE, Patrik LAMBER (University of le Mans, France)
    14:00 15:00 I²R's Machine Translation System for IWSLT 2009
    Xiangyu DUAN, Deyi XIONG, Hui ZHANG, Min ZHANG, Haizhou LI (Institute for Infocomm Research, Singapore)
    coffee break
    Evaluation Campaign: "BTEC Task"
    15:20 15:50 The NUS Statistical Machine Translation System for IWSLT 2009
    Preslav NAKOV, Chang LIU, Wei LU, Hwee Tou NG (National University of Singapore, Singapore)
    15:50 16:20 The UPV Translation System for IWSLT 2009
    Guillem GASCÓ, Joan Andreu SÁNCHEZ (Universitat Politècnica de València, Spain)
    16:20 16:50 The MIT-LL/AFRL System for IWSLT 2009
    Wade SHEN, Brian DELANEY, Arya Ryan AMINZADEH (MIT Lincoln Laboratory, USA); Timothy ANDERSON, Raymond SLYH (Air Force Research Laboratory, USA)
    Workshop Closing
    16:50 17:00 Closing Remarks
    Marcello FEDERICO (FBK-irst, Italy)

    投稿者 mpaul : 04:20| トラックバック

    Keynote Speeches


    Keynote Speech 1

    Human Translation and Machine Translation
    Philipp KOEHN (University of Edinburgh, UK)
    While most of recent machine translation work has focus on the gisting application (i.e., translating web pages), another important application is to aid human translators. To build better computer aided translation tools, we first need to understand how human translators work. We discuss how human translators work and what tools they typically use. We also build a novel tool that offers post-editing, interactive sentence completion, and display of translation options (online at www.caitra.org). We collected timing logs on interactions with the tool, which allows detailed analysis of translator behavior.

    Keynote Speech 2

    Two-way Speech-to-Speech Translation for Communicating Across Language Barriers
    Premkumar NATARAJAN (BBN Technologies, USA)
    Two-way speech-to-speech (S2S) translation is a spoken language application that integrates multiple technologies including speech recognition, machine translation, text-to-speech synthesis, and dialog management. In recent years, research into S2S systems has resulted in several modeling techniques for improving coverage on broad domains and rapid configuration for new language pairs or domains. This talk will highlight recent advances in S2S area that range from improvements in component technologies to improvements in the end-to-end system for mobile use. I will also present metrics for evaluating the S2S technology, a methodology for determining the impact of different causes of errors, and future directions for research and development.

    Keynote Speech 3

    Monolingual Knowledge Acquisition and a Multilingual Information Environment
    Kentaro TORISAWA (NICT, Japan)
    Large-scale knowledge acquisition from the Web has been a popular research topic in the last five years. This talk gives an overview of our current project aiming at the acquisition of a large scale semantic network from the Web, and in the talk I explore its possible interaction with machine translation research. Particularly, I would like to focus on two topics; multilingual corpora as source of knowledge and the applications of machine translation enabled by our technology. I will discuss a framework of bilingual co-training that gives a marked improvement in accuracy of the acquired knowledge by using two corpora written in two different languages. Also, I will show our technology can enable a new type of tasks for machine translation in Web applications.

    投稿者 mpaul : 04:15| トラックバック

    Proceedings

    - Author Index -

    Evaluation Campaign
    pp.1-18 paper slides bib Overview of the IWSLT 2009 Evaluation Campaign
    Michael PAUL
    pp.19-23 paper (not yet) bib apptek
    AppTek Turkish-English Machine Translation System Description for IWSLT 2009
    Selçuk KÖPRÜ
    pp.24-28 paper poster bib bmrc
    Barcelona Media SMT system description for the IWSLT 2009: introducing source context information
    Marta R. COSTA-JUSSA, Rafael E. BANCHS
    pp.29-36 paper slides bib dcu
    Low-Resource Machine Translation Using MaTrEx: The DCU Machine Translation System for IWSLT 2009
    Yanjun MA, Tsuyoshi OKITA, Özlem ÇETINOGLU, Jinhua DU, Andy WAY
    pp.37-44 paper poster bib fbk
    FBK @ IWSLT-2009
    Nicola BERTOLDI, Arianna BISAZZA, Mauro CETTOLO, Marcello FEDERICO (FBK-irst, Italy); Germán SANCHIS-TRILLES (Universitat Politècnica de València, Spain)
    pp.45-49 paper poster bib greyc
    The GREYC Translation Memory for the IWSLT 2009 Evaluation Campaign: one step beyond translation memory
    Yves LEPAGE, Adrien LARDILLEUX, Julien GOSME
    pp.50-54 paper poster bib i2r
    I²R's Machine Translation System for IWSLT 2009
    Xiangyu DUAN, Deyi XIONG, Hui ZHANG, Min ZHANG, Haizhou LI
    pp.55-59 paper poster bib ict
    The ICT Statistical Machine Translation Systems for the IWSLT 2009
    Haitao MI, Yang LIU, Tian XIA, Xinyan XIAO, Yang FENG, Jun XIE, Hao XIONG, Zhaopeng TU, Daqi ZHENG, Yajuan LU, Qun LIU
    pp.60-64 paper poster bib lig
    LIG approach for IWSLT09 : Using Multiple Morphological Segmenters for Spoken Language Translation of Arabic
    Fethi BOUGARES, Laurent BESACIER, Hervé BLANCHON (LIG, France)
    pp.65-70 paper poster bib lium
    LIUM's Statistical Machine Translation Systems for IWSLT 2009
    Holger SCHWENK, Loïc BARRAULT, Yannick ESTÈVE, Patrik LAMBERT
    pp.71-78 paper slides bib mit
    The MIT-LL/AFRL IWSLT-2009 System
    Wade SHEN, Brian DELANEY, Arya Ryan AMINZADEH (MIT Lincoln Laboratory, USA); Timothy ANDERSON, Raymond SLYH (Air Force Research Laboratory)
    pp.79-82 paper slides bib nict
    Two methods for stabilizing MERT: NICT at IWSLT 2009
    Masao UTIYAMA, Hirofumi YAMAMOTO, Eiichiro SUMITA
    pp.83-90 paper slides bib nlpr
    The CASIA Statistical Machine Translation System for IWSLT 2009
    Maoxi LI, Jiajun ZHANG, Yu ZHOU, Chengqing ZONG
    pp.91-98 paper slides bib nus
    The NUS Statistical Machine Translation System for IWSLT 2009
    Preslav NAKOV, Chang LIU, Wei LU, Hwee Tou NG
    pp.99-106 paper poster bib tokyo
    The UOT System: Improve String-to-Tree Translation Using Head-Driven Phrasal Structure Grammar and Predicate-Argument Structures
    Xianchao WU, Takuya MATSUZAKI, Naoaki OKAZAKI, Yusuke MIYAO, Jun'ichi TSUJII
    pp.107-112 paper poster bib tottori
    Statistical Machine Translation adding Pattern-based Machine Translation in Chinese-English Translation
    Jin'ichi MURAKAMI, Masato TOKUHISA, Satoru IKEHARA
    pp.113-117 paper poster bib tubitak
    The TÜBITAK-UEKAE Statistical Machine Translation System for IWSLT 2009
    Coskun MERMER, Hamza KAYA, Mehmet Ugur DOGAN
    pp.118-123 paper slides bib upv
    UPV Translation System for IWSLT 2009
    Guillem GASCÓ, Joan Andreu SÁNCHEZ
    pp.124-128 paper poster bib uw
    The University of Washington Machine Translation System for IWSLT 2009
    Mei YANG, Amittai AXELROD, Kevin DUH, Katrin KIRCHHOFF

    Technical Paper
    pp.129-135 paper slides bib Morphological Pre-Processing for Turkish to English Statistical Machine Translation
    Arianna BISAZZA, Marcello FEDERICO
    pp.136-143 paper slides bib Enriching SCFG Rules Directly From Efficient Bilingual Chart Parsing
    Martin CMEJREK, Bowen ZHOU, Bing XIANG
    pp.144-151 paper slides bib Structural Support Vector Machines for Log-Linear Approach in Statistical Machine Translation
    Katsuhiko HAYASHI, Taro WATANABE, Hajime TSUKADA, Hideki ISOZAKI
    pp.152-159 paper slides bib A Unified Framework for Phrase-Based, Hierarchical, and Syntax-Based Statistical Machine Translation
    Hieu HOANG, Philipp KOEHN, Adam LOPEZ
    pp.160-167 paper slides bib Online Language Model Adaptation for Spoken Dialog Translation
    German SANCHIS-TRILLES, Mauro CETTOLO, Nicola BERTOLDI, Marcello FEDERICO

    Demo
    pp.168-168 paper slides bib Network-based Speech-to-Speech Translation
    Chiori HORI, Sakriani SAKTI, Michael PAUL, Noriyuki KIMURA, Yutaka ASHIKARI, Ryosuke ISOTANI, Eiichiro SUMITA, Satoshi NAKAMURA (NICT, Japan)

    Keynote Speech
          -       abstract slides - Human Translation and Machine Translation
    Philipp KOEHN (University of Edinburgh, UK)
          -       abstract (not yet) - Two-way Speech-to-Speech Translation for Communicating Across Language Barriers
    Premkumar NATARAJAN (BBN Technologies, USA)
          -       abstract slides - Monolingual Knowledge Acquisition and a Multilingual Information Environment
    Kentaro TORISAWA (NICT, Japan)

    投稿者 mpaul : 04:12| トラックバック

    Author Index

    A-B-C-D-E-F-G-H-I-K-L-M-N-O-P-S-T-U-W-X-Y-Z

      A  
      AMINZADEH, Arya Ryan  71
      ANDERSON, Timothy   71
      ASHIKARI, Yutaka   168
      AXELROD, Amittai   124
      B  
      BANCHS, Rafael E.   24
      BARRAULT, Loïc   65
      BERTOLDI, Nicola   37, 160
      BESACIER, Laurent   60
      BISAZZA, Arianna   37, 129
      BLANCHON, Hervé   60
      BOUGARES, Fethi   60
      C  
      ÇETINOGLU, Özlem   29
      CETTOLO, Mauro   37, 160
      CMEJREK, Martin   136
      COSTA-JUSSÀ, Marta R.   24
      D  
      DELANEY, Brian   71
      DOGAN, Mehmet Ugur   113
      DU, Jinhua   29
      DUAN, Xiangyu   50
      DUH, Kevin   124
      E  
      ESTÈVE, Yannick   65
      F  
      FEDERICO, Marcello   37, 129, 160
      FENG, Yang   55
      G  
      GASCÓ, Guillem   118
      GOSME, Julian   45
      H  
      HAYASHI, Katsuhiko   144
      HOANG, Hieu   152
      HORI, Chiori   168
      I  
      IKEHARA, Satoru   107
      ISOTANI, Ryosuke   168
      ISOZAKI, Hideki   144
      K  
      KAYA, Hamza   113
      KIMURA, Noriyuki   168
      KIRCHHOFF, Katrin   124
      KOEHN, Philipp   152
      KÖPRÜ, Selçuk   19
      L  
      LAMBERT, Patrik   65
      LARDILLEUX, Adrien   45
      LEPAGE, Yves   45
      LI, Haizhou   50
      LI, Maoxi   83
      LIU, Chang   91
      LIU, Qun   55
      LIU, Yang   55
      LOPEZ, Adam   152
      LU, Wei   91
      LU, Yajuan   55
      M  
      MA, Yanjun   29
      MATSUZAKI, Takuya   99
      MERMER, Coskun   113
      MI, Haitao   55
      MIYAO, Yusuke   99
      MURAKAMI, Jin'ichi   107
      N  
      NAKAMURA, Satoshi   168
      NAKOV, Preslav   91
      NG, Hwee Tou   91
      O  
      OKAZAKI, Naoaki   99
      OKITA, Tsuyoshi   29
      P  
      PAUL, Michael   1, 168
      S  
      SAKTI, Sakriani   168
      SÁNCHEZ, Joan Andreu   118
      SANCHIS-TRILLES, Germán   37, 160
      SCHWENK, Holger   65
      SHEN, Wade   71
      SLYH, Raymond   71
      SUMITA, Eiichiro   79, 168
      T  
      TOKUHISA, Masato   107
      TSUJII, Jun'ichi   99
      TSUKADA, Hajime   144
      TU, Zhaopeng   55
      U  
      UTIYAMA, Masao   79
      W  
      WATANABE, Taro   144
      WAY, Andy   29
      WU, Xianchao   99
      X  
      XIA, Tian   55
      XIANG, Bing   136
      XIAO, Xinyan   55
      XIE, Jun   55
      XIONG, Deyi   50
      XIONG, Hao   55
      Y  
      YAMAMOTO, Hirofumi   79
      YANG, Mei   124
      Z  
      ZHANG, Hui   50
      ZHANG, Jiajun   83
      ZHANG, Min   50
      ZHENG, Daqi   55
      ZHOU, Bowen   136
      ZHOU, Yu   83
      ZONG, Chengqing   83

    投稿者 mpaul : 04:10| トラックバック

    Bibliography

    @inproceedings{iwslt09:EC:overview,
    author= {Michael Paul},
    title= {{Overview of the IWSLT 2009 Evaluation Campaign}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {1-18},
    }
     
    @inproceedings{iwslt09:EC:apptek,
    author= {Sel\c{o}uk K\"{o}pr\"{u}},
    title= {{AppTek Turkish-English Machine Translation System Description for IWSLT 2009}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {19-23},
    }
     
    @inproceedings{iwslt09:EC:bmrc,
    author= {Marta R. Costa-Juss\`{a} and Rafael E. Banchs},
    title= {{Barcelona Media SMT system description for the IWSLT 2009: introducing source context information}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {24-28},
    }
     
    @inproceedings{iwslt09:EC:dcu,
    author= {Yanjun Ma and Tsuyoshi Okita and \"{O}zlem \c{C}etino\u{g}lu and Jinhua Du and Andy Way},
    title= {{Low-Resource Machine Translation Using MaTrEx: The DCU Machine Translation System for IWSLT 2009}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {29-36},
    }
     
    @inproceedings{iwslt09:EC:fbk,
    author= {Nicola Bertoldi and Arianna Bisazza and Mauro Cettolo and Germ\'{a};n Sanchis-Trilles and Marcello Federico},
    title= {{FBK @ IWSLT-2009}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {37-44},
    }
     
    @inproceedings{iwslt09:EC:greyc,
    author= {Yves Lepage and Adrien Lardilleux and Julien Gosme},
    title= {{The GREYC Translation Memory for the IWSLT 2009 Evaluation Campaign: one step beyond translation memory}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {45-49},
    }
     
    @inproceedings{iwslt09:EC:i2r,
    author= {Xiangyu Duan and Deyi Xiong and Hui Zhang and Min Zhang and Haizhou Li},
    title= {{I${}^{2}$R's Machine Translation System for IWSLT 2009}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {50-54},
    }
     
    @inproceedings{iwslt09:EC:ict,
    author= {Haitao Mi and Yang Liu and Tian Xia and Xinyan Xiao and Yang Feng and Jun Xie and Hao Xiong and Zhaopeng Tu and Daqi Zheng and Yajuan Lu and Qun Liu},
    title= {{The ICT Statistical Machine Translation Systems for the IWSLT 2009}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {55-59},
    }
     
    @inproceedings{iwslt09:EC:lig,
    author= {Fethi Bougares and Laurent Besacier and Herv\'{e} Blanchon},
    title= {{LIG approach for IWSLT09 : Using Multiple Morphological Segmenters for Spoken Language Translation of Arabic}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {60-64},
    }
     
    @inproceedings{iwslt09:EC:lium,
    author= {Holger Schwenk and Lo\"{i}c Barrault and Yannick Est\`{e}ve and Patrik Lambert},
    title= {{LIUM's Statistical Machine Translation Systems for IWSLT 2009}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {65-70},
    }
     
    @inproceedings{iwslt09:EC:mit,
    author= {Wade Shen and Brian Delaney and Arya Ryan Aminzadeh and Timothy Anderson and Raymond Slyh},
    title= {{The MIT-LL/AFRL IWSLT-2009 System}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {71-78},
    }
     
    @inproceedings{iwslt09:EC:nict,
    author= {Masao Utiyama and Hirofumi Yamamoto and Eiichiro Sumita},
    title= {{Two methods for stabilizing MERT: NICT at IWSLT 2009}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {79-82},
    }
     
    @inproceedings{iwslt09:EC:nlpr,
    author= {Maoxi Li and Jiajun Zhang and Yu Zhou and Chengqing Zong},
    title= {{The CASIA Statistical Machine Translation System for IWSLT 2009}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {83-90},
    }
     
    @inproceedings{iwslt09:EC:nus,
    author= {Preslav Nakov and Chang Liu and Wei Lu and Hwee Tou Ng},
    title= {{The NUS Statistical Machine Translation System for IWSLT 2009}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {91-98},
    }
     
    @inproceedings{iwslt09:EC:tokyo,
    author= {Xianchao Wu and Takuya Matsuzaki and Naoaki Okazaki and Yusuke Miyao and Jun'ichi Tsujii},
    title= {{The UOT System: Improve String-to-Tree Translation Using Head-Driven Phrasal Structure Grammar and Predicate-Argument Structures}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {99-106},
    }
     
    @inproceedings{iwslt09:EC:tottori,
    author= {Jin'ichi Murakami and Masato Tokuhisa and Satoru Ikehara},
    title= {{Statistical Machine Translation adding Pattern-based Machine translation in Chinese-English Translation}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {107-112},
    }
     
    @inproceedings{iwslt09:EC:tubitak,
    author= {{Co\c{s}kun} Mermer and Hamza Kaya and Mehmet U\v{g}ur Do\v{g}an},
    title= {{The T\"{U}BITAK-UEKAE Statistical Machine Translation System for IWSLT 2009}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {113-117},
    }
     
    @inproceedings{iwslt09:EC:upv,
    author= {Guillem Gasc\'{o} and Joan Andreu S\'{a}nchez},
    title= {{UPV Translation System for IWSLT 2009}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {118-123},
    }
     
    @inproceedings{iwslt09:EC:uw,
    author= {Mei Yang and Amittai Axelrod and Kevin Duh and Katrin Kirchhoff},
    title= {{The University of Washington Machine Translation System for IWSLT 2009}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {124-128},
    }
     
    @inproceedings{iwslt09:TP:bisazza,
    author= {Arianna Bisazza and Marcello Federico},
    title= {{Morphological Pre-Processing for Turkish to English Statistical Machine Translation}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {129-135},
    }
     
    @inproceedings{iwslt09:TP:cmejrek,
    author= {Martin Cmejrek and Bowen Zhou and Bing Xiang},
    title= {{Enriching SCFG Rules Directly From Efficient Bilingual Chart Parsing}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {136-143},
    }
     
    @inproceedings{iwslt09:TP:hayashi,
    author= {Katsuhiko Hayashi and Taro Watanabe and Hajime Tsukada and Hideki Isozaki},
    title= {{Structural Support Vector Machines for Log-Linear Approach in Statistical Machine Translation}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {144-151},
    }
     
    @inproceedings{iwslt09:TP:hoang,
    author= {Hieu Hoang and Philipp Koehn and Adam Lopez},
    title= {{A Unified Framework for Phrase-Based, Hierarchical, and Syntax-Based Statistical Machine Translation}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {152-159},
    }
     
    @inproceedings{iwslt09:TP:sanchis,
    author= {Germ\'{a}n Sanchis-Trilles and Mauro Cettolo and Nicola Bertoldi and Marcello Federico},
    title= {{Online Language Model Adaptation for Spoken Dialog Translation}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {160-167},
    }
     
    @inproceedings{iwslt09:DEMO:nict,
    author= {Chiori Hori and Sakriani Sakti and Michael Paul and Noriyuki Kimura and Yutaka Ashikari and Ryosuke Isotani and Eiichiro Sumita and Satoshi Nakamura},
    title= {{Network-based Speech-to-Speech Translation}},
    year= {2009},
    booktitle= {Proc. of the International Workshop on Spoken Language Translation},
    address= {Tokyo, Japan},
    pages= {168},
    }

    投稿者 mpaul : 04:08| トラックバック

    Venue

    National Museum of Emerging Science and Innovation

    2-41, Aomi, Koto-ku, Tokyo, Japan
    (Tel) +81-3-3570-9151 (Fax) +81-3-3570-9150
    http://www.miraikan.jst.go.jp/en

    entrance @ 1F ( floor map)
    workshop @ 7F ( floor map)

    Access
    Tourism Info

    投稿者 mpaul : 04:00| トラックバック

    Gallery

    IWSLT 2009
    December 1-2, 2009
    National Museum of Emerging Science and Innovation
    Tokyo, Japan


    December 1
    camera 1


    camera 2
    December 2
    camera 1


    camera 2

    投稿者 mpaul : 03:30| トラックバック

    Organizers

    Organizers
    • Alex Waibel (CMU, USA / UKA, Germany)
    • Marcello Federico (FBK, Italy)
    • Satoshi Nakamura (NICT, Japan)

    Chairs

    • Eiichiro Sumita (NICT, Japan; Workshop)
    • Michael Paul (NICT, Japan; Evaluation Campaign)
    • Marcello Federico (FBK, Italy; Technical Paper)

    Program Committee

    • Laurent Besacier (LIG, France)
    • Francisco Casacuberta (ITI-UPV, Spain)
    • Boxing Chen (NRC, Canda)
    • Philipp Koehn (Univ. Edinburgh, UK)
    • Philippe Langlais (Univ. Montreal, Canada)
    • Geunbae Lee (Postech, Korea)
    • Yves Lepage (GREYC, France)
    • Haizhou Li (I2R, Singapore)
    • Qun Liu (ICT, China)
    • José B. Mariño (TALP-UPC, Spain)
    • Coskun Mermer (TUBITAK, Turkey)
    • Christof Monz (QMUL, UK)
    • Hermann Ney (RWTH, Germany)
    • Holger Schwenk (LIUM, France)
    • Wade Shen (MIT-LL, USA)
    • Hajime Tsukada (NTT, Japan)
    • Haifeng Wang (TOSHIBA, China)
    • Andy Way (DCU, Ireland)
    • Chengqing Zong (CASIA, China)

    Local Arrangements

    • Mari Oku (NICT, Japan)

    Supporting Organizations

    National Institute of Information and Communications Technology
    The Scientific and Technological Research Council of Turkey (TUBITAK) National Research Institute of Electronics and Cryptology (UEKAE)

    投稿者 mpaul : 03:00| トラックバック

    Contact

    WORKSHOP ORGANIZATION
    Eiichiro Sumita
    (reverse) email: jp *dot* co *dot* nict *at* sumita *dot* eiichiro

    EVALUATION CAMPAIGN
    Michael Paul
    (reverse) email: jp *dot* go *dot* nict *at* paul *dot* michael

    TECHNICAL PAPER
    Marcello Federico
    (reverse) email: eu *dot* fbk *at* federico

    LOCAL ARRANGEMENT
    Mari Oku
    (reverse) email: jp *dot* go *dot* nict *dot* khn *at* iwsltlocal09


    National Institute of Information and Communications Technology (NICT)
    Knowledge Creating Communciation Research Center
    MASTAR Project

    2-2-2 Hikaridai, Keihanna Science City, Kyoto 619-0288, Japan
    TEL: +81-774-95-1301
    FAX: +81-774-95-1308

    投稿者 mpaul : 02:00| トラックバック

    References

    Events Co-located with IWSLT 2009

    IWSLT Evaluation Campaigns

    投稿者 mpaul : 01:00| トラックバック

ウェブページ

このアーカイブについて

このページには、2009年12月に書かれたブログ記事が新しい順に公開されています。

最近のコンテンツはアーカイブのページで見られます。