2009年12月 1日
Theme
The International Workshop on Spoken Language Translation (IWSLT) is a
yearly, open evaluation campaign for spoken language translation followed by a scientific workshop, in which both system descriptions and scientific papers are presented. IWSLT's evaluations are not competition-oriented, but their goal is to foster cooperative work and scientific exchange. In this respect, IWSLT proposes challenging research tasks and an open experimental infrastructure for the scientific community working on spoken and written language translation.
Evaluation Campaign
The 6th International Workshop on Spoken Language Translation will take place in Tokyo, Japan in December 2009. The focus of this year's evaluation campaign will be the translation of task-oriented human dialogs in travel situations. The speech data was recorded through human interpreters, where native speakers of different languages were asked to complete certain travel-related tasks like hotel reservations using their mother-tongue. The translation of the freely-uttered conversation was carried-out by human interpreters. The obtained speech data was annotated with dialog and speaker information. For the Challenge Task, IWSLT participants will have to translate both, the Chinese and English, outputs of the automatic speech recognizers (lattice, N/1BEST) into English and Chinese, respectively.
Like in previous IWSLT events, a standard BTEC Task will be provided. However, the BTEC Task will focus on text input only, i.e. no automatic speech recognizer results (lattice, N/1BEST) have to be translated. In addition to the Arabic-English and Chinese-English translation tasks, this year's evaluation campaign features Turkish as a new input language.
Each participant in the evaluation campaign is requested to submit a paper describing the MT system, the utilized resources, and results using the provided test data. Contrastive run submissions using only the bilingual resources provided by IWSLT as well as investigations in the contribution of each utilized resource are highly appreciated. Moreover, all participants are requested to present their papers at the workshop.
Scientific Paper
In addition to the evaluation campaign, the IWSLT 2009 workshop also invites scientific paper submissions related to spoken language technologies. Possible topics include, but are not limited to:
- Spoken dialog modeling
- Integration of ASR and MT
- SMT, EBMT, RBMT, Hybrid MT
- MT evaluation
- Language resources for MT
- Open source software for MT
- Pivot-language-based MT
- Task adaptation and portability in MT
投稿者 mpaul : 10:00| トラックバック
Evaluation Campaign
The evaluation campaign is carried out using BTEC (
Basic Travel Expression Corpus), a multilingual speech corpus containing tourism-related sentences similar to those that are usually found in phrasebooks for tourists going abroad. In addition, parts of the SLDB (
Spoken Language Databases) corpus, a collection of human-mediated cross-lingual dialogs in travel situations, are provided to the participants of the
Challenge Task. Details about the supplied corpora, the data set conditions for each track, the guidelines on how to submit one's translation results, and the evaluation specifications used in this workshop are given below.
Please note that compared to previous IWSLT evaluation campaigns, the guidelines for how to use the language resources for each data track have changed for IWSLT 2009. Starting in 2007, we encouraged everyone to collect out-of-domain language resources and tools that could be shared between the participants. This was very helpful for many participants and allowed many interesting experiments, but had the side-effect of the system outputs being difficult to compare because it was impossible to find out whether certain gains in performance were triggered by better suited (or simply more) language resources (
engineering aspects) or by improvements in the underlying decoding algorithms and statistical models (
research aspects). After the IWSLT 2008 workshop, many participants asked us to focus on the
research aspects for IWSLT 2009.
Therefore,
the monolingual and bilingual language resources that should be used to train the translation engines for the primay runs are limited to the supplied corpus for each translation task. This includes all supplied development sets, i.e., you are free to use these data sets as you wish for tuning of model parameters or as training bitext, etc.
All other languages resources besides the ones for the given translation task, should be treated as "additional language resources". For examples, any additional dictionaries, word lists, bitext corpora such as the ones provided by LDC. In addition, some participants asked whether they could use the BTEC TE and BTEC AE supplied resources for the BTEC CE task. These should also be treated as "additional resources".
Because it is impossible to limit the usage of linguistic tools like word segmentation tools, parsers, etc., those tools are allowed to preprocess the supplied corpus, but we kindly ask participants to describe in detail which tools were applied for data preprocessing in their system description paper.
In order to motivate participants to continue to explore the effects of additional language resources (model adaptation, OOV handling, etc.) we
DO ACCEPT contrastive runs based on additional resources. These will be evaluated automatically using the same framework as the primary runs, thus the results will be directly comparable to this year's primary runs and can be published by the participants in the MT system description paper or in a scientific paper.
Due to the workshop budget limits, however, it would be difficult to include all contrastive runs into the subjective evaluation. Therefore, we kindly ask the partipants for a contribution if they would like to obtain a human assessment of their contrastive runs as well. If you intend to do so, please contact us as soon as possible, so that we can adjust the evaluation schedule accordingly.
Contrastive run results will not appear in the overview paper, but participants are free to report their findings in the MT system description paper or even a separate scientific paper submission.
[Corpus Specifications]
[Translation Input Conditions]
[Evaluation Specifications]
Corpus Specifications
BTEC Training Corpus:
- example:
- TRAIN_00001\01\This is the first training sentence.
- TRAIN_00002\01\This is the second training sentence.
- Arabic-English (AE)
- Chinese-English (CE)
- Turkish-English (TE)
- 20K sentences randomly selected from the BTEC corpus
- coding: UTF-8
- text is case-sensitive and includes punctuations
BTEC Develop Corpus:
- text input example:
- DEV_001\01\This is the first develop sentence.
- DEV_002\01\This is the second develop sentence.
- reference translation example:
DEV_001\01\1st reference translation for 1st input
DEV_001\02\2nd reference translation for 1st input
...
DEV_002\01\1st reference translation for 2nd input
DEV_002\02\2nd reference translation for 2nd input
...
- Arabic-English
- CSTAR03 testset: 506 sentences, 16 reference translations
- IWSLT04 testset: 500 sentences, 16 reference translations
- IWSLT05 testset: 506 sentences, 16 reference translations
- IWSLT07 testset: 489 sentences, 6 reference translations
- IWSLT08 testset: 507 sentences, 16 reference translations
- Chinese-English
- CSTAR03 testset: 506 sentences, 16 reference translations
- IWSLT04 testset: 500 sentences, 16 reference translations
- IWSLT05 testset: 506 sentences, 16 reference translations
- IWSLT07 testset: 489 sentences, 6 reference translations
- IWSLT08 testset: 507 sentences, 16 reference translations
- Turkish-English
- CSTAR03 testset: 506 sentences, 16 reference translations
- IWSLT04 testset: 500 sentences, 16 reference translations
BTEC Test Corpus:
- Arabic-English
- Chinese-English
- Turkish-English
- 470 unseen sentences of the BTEC evaluation corpus
- coding: → see BTEC Develop Corpus
- data format: → see BTEC Develop Corpus
CHALLENGE Training Corpus:
- Chinese-English (CE)
- English-Chinese (EC)
- 394 dialogs, 10K sentences from the SLDB corpus
- coding: UTF-8
- word segmentations according to ASR output segmentation
- text is case-sensitive and includes punctuations
CHALLENGE Develop Corpus:
- ASR output (lattice, NBEST, 1BEST), correct recognition result transcripts (text), reference translations of SLDB dialogs
- data format:
- 1-BEST
- each line consists of three fields divided by the character '\'
- sentence consisting of words divided by single spaces
format: <SENTENCE_ID>\01\<RECOGNITION_HYPOTHESIS>
- Field_1: sentence ID
- Field_2: paraphrase ID
- Field_3: best recognition hypothesis
- example (input):
IWSLT09_CT.devset_dialog01_02\01\best ASR hypothesis for 1st utterance
IWSLT09_CT.devset_dialog01_04\01\best ASR hypothesis for 2nd utterance
IWSLT09_CT.devset_dialog01_06\01\best ASR hypothesis for 3rd utterance
...
- N-BEST
- each line consists of three fields divided by the character '\'
- sentence consisting of words divided by single spaces
format: <SENTENCE_ID>\01\<RECOGNITION_HYPOTHESIS>
- Field_1: sentence ID
- Field_2: NBEST ID (max: 20)
- Field_3: recognition hypothesis
- example (input):
IWSLT09_CT.devset_dialog01_02\01\best ASR hypothesis for 1st utterance
IWSLT09_CT.devset_dialog01_02\02\2nd-best ASR hypothesis for 1st utterance
...
IWSLT09_CT.devset_dialog01_02\20\20th-best ASR hypothesis for 1st utterance
IWSLT09_CT.devset_dialog01_04\01\best ASR hypothesis for 2nd utterance
...
- reference translations
- each line consists of three fields divided by the character '\'
- sentence consisting of words divided by single spaces
format: <SENTENCE_ID>\lt;PARAPHRASE_ID>\<REFERENCE>
- Field_1: sentence ID
- Field_2: paraphrase ID
- Field_3: reference translation
- example:
IWSLT09_CT.devset_dialog01_02\01\1st reference translation for 1st input
IWSLT09_CT.devset_dialog01_02\02\2nd reference translation for 1st input
...
IWSLT09_CT.devset_dialog01_04\01\1st reference translation for 2nd input
IWSLT09_CT.devset_dialog01_04\02\2nd reference translation for 2nd input
...
- Chinese-English
- IWSLT05 testset: 506 sentences, 16 reference translations (read speech)
- IWSLT06 devset: 489 sentences, 16 reference translations (read speech, spontaneous speech)
- IWSLT06 testset: 500 sentences, 16 reference translations (read speech, spontaneous speech)
- IWSLT08 devset: 245 sentences, 7 reference translations (spontaneous speech)
- IWSLT08 testset: 506 sentences, 7 reference translations (spontaneous speech)
- IWSLT09 devset: 10 dialogs, 200 sentences, 4 reference translations (spontaneous speech)
- English-Chinese
- IWSLT05 testset: 506 sentences, 16 reference translations (read speech)
- IWSLT08 devset: 245 sentences, 7 reference translations (spontaneous speech)
- IWSLT08 testset: 506 sentences, 7 reference translations (spontaneous speech)
- IWSLT09 devset: 10 dialogs, 210 sentences, 4 reference translations (spontaneous speech)
CHALLENGE Test Corpus:
- Chinese-English
- 27 dialogs, 405 sentences
- coding: → see CHALLENGE Develop Corpus
- TXT data format: → see CHALLENGE Develop Corpus
- INFO data format: → see CHALLENGE Training Corpus
- English-Chinese
- 27 dialogs, 393 sentences
- coding: → see CHALLENGE Develop Corpus
- TXT data format: → see CHALLENGE Training Corpus
- INFO data format: → see CHALLENGE Training Corpus
Translation Input Conditions
Spontaneous Speech
- Challenge Task
- Chinese-English
- English-Chinese
→ ASR output (word lattice, N-best, 1-best) of ASR engines provided by IWSLT organizers
Correct Recognition Results
- Challenge Task
- Chinese-English
- English-Chinese
- BTEC Task
- Arabic-English
- Chinese-English
- Turkish-English
→ text input
Evaluation
Subjective Evaluation:
- Metrics:
- ranking
(= official evaluation metrics to order MT system scores) 
→ all primary run submissions
- fluency/adequacy
→ top-ranked primary run submission
- dialog adequacy
(= adequacy judgments in the context of the given dialog)
→ top-ranked primary run submission
- Evaluators:
- 3 graders per translation
Automatic Evaluation:
- Metrics:
- BLEU/NIST (NIST v13)
→ bug fixes to handle empty translations and IWSLT supplied corpus can be found here.
→ up to 7 reference translations
→ all run submissions
- Evaluation Specifications:
- case+punc:
- case sensitive
- with punctuation marks tokenized
- no_case+no_punc:
- case insensitive (lower-case only)
- no punctuation marks
- Data Processing Prior to Evaluation:
- English MT Output:
- simple tokenization of punctuations (see 'tools/ppEnglish.case+punc.pl' script)
- Chinese MT Output:
- segmentation into characters (see 'tools/splitUTF8Characters' script)
投稿者 mpaul : 09:00| トラックバック
Important Dates
Evaluation Campaign
Event |
Date |
Training Corpus Release |
June 19, 2009 |
Test Corpus Release |
Aug 14, 2009 |
Run Submission Due |
Aug 28, 2009 |
Result Feedback to Participants |
September 11, 2009 |
MT System Descriptions Due |
September 18, 2009 |
Notification of Acceptance |
October 16, 2009 |
Camera-ready Paper Due |
October 31, 2009 |
Workshop |
December 1 - 2, 2009 |
Technical Papers
Event |
Date |
Paper Submission Due |
August 21, 2009 |
Notification of Acceptance |
October 9, 2009 |
Camera-ready Paper Due |
October 31, 2009 |
Workshop |
December 1 - 2, 2009 |
投稿者 mpaul : 08:00| トラックバック
Downloads
IWSLT 2009 Corpus Release (for IWSLT 2009 participants only)
User License Agreement |
DOC, PDF |
Download |
CHALLENGE Task
BTEC Task
|
In order to get access to the corpus, please follow the procedure below.
Access will be enabled AFTER we received your original signed user license agreement.
- download the post-workshop user license agreement (click on DOC/PDF link above), sign it, and send it two copies to:
Michael Paul
National Institute of Information and Communications Technology
Knowledge Creating Communciation Research Center
MASTAR Project
Language Translation Group
3-5 Hikaridai, "Keihanna Science City"
Kyoto 619-0289, Japan
|
- download the corpus files using the ID and Password you obtained for the download of the training data files for IWSLT 2009.
Corpus Data Files
- train:- (BTEC) 20K sentence pairs of translation examples with case and punctuation information segmented according to utilized ASR engine
- (CHALLENGE) in addition to BTEC@train, 10K sentence pairs of translation examples with dialog annotations
- dev:- up to 6 evaluation data sets containing 500 source language sentences with multiple references and ASR output data files (= testsets of previous IWSLT evaluation campaigns)
- test:- 500 source language sentences and ASR output data files (= input of run-submissions of this years evaluation campaign)
- tools:- preprocessing scripts (tokenization, NBEST extraction, etc.) used to prepare the data sets
For data set details, click on the translation direction name tag.
Templates for LaTeX/MSWord
投稿者 mpaul : 06:00| トラックバック
Submission
Submissions of Technical Papers and MT System Descriptions must be done electronically in PDF format using the above links. The style-files and templates are available at the download page. Authors are strongly encouraged to use the provided LaTeX style files or MS-Word equivalents. Submissions should follow the "Paper Submission Format Guidelines" listed below.
Paper Submission Format Guidelines
The format of each paper submission (evaluation campaign and technical paper) should agree with the "Camera-Ready Paper Format Guidelines" listed below.
Camera-Ready Paper Format Guidelines
- PDF file format
- Maximum eight (8) pages (Standard A4 size: 210 mm by 297 mm preferred)
- Single-spaced
- Two (2) columns
- Printed in black ink on white paper and check that the positioning (left and top margins) as well as other layout features are correct.
- No smaller than nine (9) point type font throughout the paper, including figure captions.
- To achieve the best viewing experience for the Proceedings, we strongly encourage to use Times-Roman font (the LaTeX style file as well as the Word template files use Times-Roman). This is needed in order to give the Proceedings a uniform look.
- Do NOT include headers and footers. The page numbers and conference identification will be post processed automatically, at the time of printing the Proceedings.
- The first page should have the paper title, author(s), and affiliation(s) centered on the page across both columns. The remainder of the text must be in the two-column format, staying within the indicated image area.
- Follow the style of the sample paper that is included with regard to title, authors, affiliations, abstract, heading, and subheadings.
Paper Title
The paper title must be in boldface. All non-function words must be capitalized, and all other words in the title must be lower case. The paper title is centered across the top of the two columns on the first page as indicated above
Authors' Name(s)
The authors' name(s) and affiliation(s) appear centered below the paper title. If space permits, include a mailing address here. The templates indicate the area where the title and author information should go. These items need not be strictly confined to the number of lines indicated; papers with multiple authors and affiliations, for example, may require two or more lines for this information.
Abstract
Each paper must contain an abstract that appears at the beginning of the paper.
Major Headings
Major headings are in boldface, with the first word capitalized and the rest of the heading in lower case. Examples of the various levels of headings are included in the templates.
Sub Headings
Sub headings appear like major headings, except they start at the left margin in the column.
Sub-Sub Headings
Sub-sub headings appear like sub headings, except they are in italics and not bold face.
References
Number and list all references at the end of the paper. The references are numbered in order of appearance in the document. When referring to them in the text, type the corresponding reference number in square brackets as shown at the end of this sentence [1].
(This is done automatically when using the Latex template).
Illustrations
Illustrations must appear within the designated margins, and must be positioned within the paper margins. They may span the two columns. If possible, position illustrations at the top of columns, rather than in the middle or at the bottom. Caption and number every illustration. All half-tone or color illustrations must be clear when printed in black and white.
Templates
If your paper will be typeset using LaTeX, please download the template package here that will generate the proper format.
To extract files under UNIX run: $ unzip latex_template_iwslt09.tgz
Paper Status
After submission, each paper will be given a unique Paper ID and a password. This will be shown on the confirmation page right after submission of the documents and a confirmation email including the Paper ID will be sent to the author of the paper as well. It will be possible to check and correct (if necessary) the submitted paper information (names, affiliations, etc.). Corrections/uploads can be made up to the respective submission deadline.
Paper Acceptance/Rejection Information
Each corresponding author will be notified by e-mail of acceptance/rejection. Reviewer feedback will also be available for each paper.
投稿者 mpaul : 05:00| トラックバック
Run Submission Guidelines
BTEC Translation Task (BTEC_AE, BTEC_CE, BTEC_TE)
data format:
- same format as the DEVELOP data sets.
For details, refer to the respective README files:
+ IWSLT/2009/corpus/BTEC/Arabic-English/README.BTEC_AE.txt
+ IWSLT/2009/corpus/BTEC/Chinese-English/README.BTEC_CE.txt
+ IWSLT/2009/corpus/BTEC/Turkish-English/README.BTEC_TE.txt
- input text is case-sensitive and contains punctuations
- English MT output should:
- be in the same format as the input file (<SentenceID>\01\MT_output_text)
- be case-sensitive, with punctuations
- contain the same amount of lines (=sentences) as the input file
Example:
TEST_IWSLT09_001\01\This is the E translation of the 1st sentence.
TEST_IWSLT09_002\01\This is the E translation of the 2nd sentence.
TEST_IWSLT09_003\01\
TEST_IWSLT09_004\01\The previous input (ID=003) could not be translated, thus the translation is empty!
TEST_IWSLT09_005\01\...
...
TEST_IWSLT09_469\01\This is the E translation of the last sentence.
run submission format:
- each participant has to translate and submit at least one translation of the given input files for each of the translation task they registered for.
- multiple run submissions are allowed, but participants have to explicitly indicate one PRIMARY run that will be used for human assessments. All other run submissions are treated as CONTRASTIVE runs. In case that none of the runs is marked as PRIMARY, the latest submission (according to the file time-stamp) will be used for the subjective evaluation.
- runs have to be submitted as a gzipped TAR archive (format see below) and send as an email attachement to "Michael Paul" (michael.paul@nict.go.jp).
TAR archive file structure:
<UserID>/<TranslationTask>.<UserID>.primary.txt
/<TranslationTask>.<UserID>.contrastive1.txt
/<TranslationTask>.<UserID>.contrastive2txt
/...
where: <UserID> = user ID of participant used to download data files
<TranslationTask> = BTEC_AE | BTEC_CE | BTEC_TE
Examples:
nict/BTEC_AE.nict.primary.txt
/BTEC_CE.nict.primary.txt
/BTEC_CE.nict.contrastive1.txt
/BTEC_CE.nict.contrastive2.txt
/BTEC_CE.nict.contrastive3.txt
/BTEC_TE.nict.primary.txt
/BTEC_TE.nict.contrastive1.txt
- re-submitting your runs is allowed as far as the mails arrive BEFORE the submission deadline. In case that multiple TAR archives are submitted by the same participant, only the runs of the most recent submission mail will be used for the IWSLT 2009 evaluation and previous mails will be ignored.
CHALLENGE Translation Task (CT_CE, CT_EC)
data format:
- same format as the DEVELOP data sets.
For details, refer to the respective README files:
+ IWSLT/2009/corpus/CHALLENGE/Chinese-English/
README.CT_CE.txt
+ IWSLT/2009/corpus/CHALLENGE/English-Chinese/
README.CT_EC.txt
- the input data sets are created from the speech recognition results (ASR output) and therefore are CASE-INSENSITIVE and does NOT contain punctuations
- the input data sets of the CHALLENGE tasks are separated according to the source language:
+ Chinese input data:
IWSLT/2009/corpus/CHALLENGE/Chinese-English/test
+ English input data:
IWSLT/2009/corpus/CHALLENGE/English-Chinese/test
The dialog structure is reflected in the respective sentence ID.
Example:
(dialog structure)
IWSLT09_CT.testset_dialog01_01\01\...1st English utterance...
IWSLT09_CT.testset_dialog01_02\01\...1st Chinese utterance...
IWSLT09_CT.testset_dialog01_03\01\...2nd English utterance...
IWSLT09_CT.testset_dialog01_04\01\...2nd Chinese utterance...
IWSLT09_CT.testset_dialog01_05\01\...3rd Chinese utterance...
IWSLT09_CT.testset_dialog01_06\01\...3rd English utterance...
...
(English input data to be translated into Chinese)
+ IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/TXT/
IWSLT09_CT.testset.en.txt
IWSLT09_CT.testset_dialog01_01\01\...1st English utterance...
IWSLT09_CT.testset_dialog01_03\01\...2nd English utterance...
IWSLT09_CT.testset_dialog01_06\01\...3rd English utterance...
...
(Chinese input data to be translated into English)
+ IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/TXT/
IWSLT09_CT.testset.zh.txt
IWSLT09_CT.testset_dialog01_02\01\...1st Chinese utterance...
IWSLT09_CT.testset_dialog01_04\01\...2nd Chinese utterance...
IWSLT09_CT.testset_dialog01_05\01\...3rd Chinese utterance...
...
- English MT output should:
+ be in the same format as the Chinese input file
(<SentenceID>\01\MT_output_text)
+ be case-sensitive, with punctuations
+ contain the same amount of lines (=sentences) as the Chinese input file
Example:
+ nict/CT_CE.nict.primary.txt
IWSLT09_CT.testset_dialog01_02\01\...E translation of 1st Chinese utterance...
IWSLT09_CT.testset_dialog01_04\01\...E translation of 2nd Chinese utterance...
IWSLT09_CT.testset_dialog01_05\01\...E translation of 3rd Chinese utterance...
...
- Chinese MT output should:
+ be in the same format as the English input file (<SentenceID>\01\MT_output_text)
+ be case-sensitive, with punctuations
+ contain the same amount of lines (=sentences) as the English input file
Example:
+ nict/CT_EC.nict.primary.txt
IWSLT09_CT.testset_dialog01_01\01\...C translation of 1st English utterance...
IWSLT09_CT.testset_dialog01_03\01\...C translation of 2nd English utterance...
IWSLT09_CT.testset_dialog01_06\01\...C translation of 3rd English utterance...
...
run submission format:
- each participant registered for the Challenge Task has to translate both translation directions (English-Chinese AND Chinese-English) and submit a total of 4 MT output files per run:
+ translations of 2 input data conditions (CRR, ASR) for Chinese-English AND
+ translations of 2 input data conditions (CRR, ASR) for English-Chinese.
(1) the correct recognition result (CRR) data files, i.e., the human transcriptions of the Challenge Task data files that do not include recognition errors:
CE: IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/TXT/
IWSLT09_CT.testset.zh.txt
EC: IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/TXT/
IWSLT09_CT.testset.en.txt
(2) the speech recognition output (ASR output, with recognition errors), whereby the participants are free to choose any of the following three ASR output data types as the input of their MT system:
(a) word lattices:
CE: IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
SLF/testset/*.zh.SLF
EC: IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
SLF/testset/*.en.SLF
(b) NBEST hypotheses:
CE: IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
NBEST/IWSLT09.testset.zh.20BEST.txt
or
IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
NBEST/testset/*.zh.20BEST.txt
EC: IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
NBEST/IWSLT09.testset.en.20BEST.txt
or
IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
NBEST/testset/*.en.20BEST.txt
[NOTE] larger NBEST list can be generated from the lattice data files using the following tools:
+ IWSLT/2009/corpus/CHALLENGE/Chinese-English/tools/
extract_NBEST.zh.CT_CE.testset.sh
+ IWSLT/2009/corpus/CHALLENGE/English-Chinese/tools/
extract_NBEST.en.CT_EC.testset.sh
(c) 1BEST hypotheses:
CE: IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
1BEST/IWSLT09.testset.zh.1BEST.txt
or
IWSLT/2009/corpus/CHALLENGE/Chinese-English/test/
1BEST/testset/*.zh.1BEST.txt
EC: IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
1BEST/IWSLT09.testset.en.1BEST.txt
or
IWSLT/2009/corpus/CHALLENGE/English-Chinese/test/
1BEST/testset/*.en.1BEST.txt
[NOTE] submissions containing only the results for one translation direction will be excluded from the subjective evaluation for IWSLT 2009.
- multiple run submissions are allowed, but participants have to explicitly indicate one PRIMARY run that will be used for human assessments. All other run submissions are treated as CONTRASTIVE runs. In case that none of the runs is marked as PRIMARY, the latest submission (according to the file time-stamp) will be used for the subjective evaluation.
- runs have to be submitted as a gzipped TAR archive (format see below) and send as an email attachement to "Michael Paul" (michael.paul@nict.go.jp).
TAR archive file structure:
<UserID>/CT_CE.<UserID>.primary.CRR.txt
/CT_CE.<UserID>.primary.ASR.<CONDITION>.txt
/CT_EC.<UserID>.primary.CRR.txt
/CT_EC.<UserID>.primary.ASR.<CONDITION>.txt
/...
where: <UserID> = user ID of participant used to download data files
<CONDITION> = SLF | <NUM>
<NUM> = number of recognition hypotheses used for translation, e.g.,
'1' - 1-best recognition result
'20' - 20-best hypotheses list
Examples:
nict/CT_CE.nict.primary.CRR.txt
/CT_CE.nict.primary.ASR.SLF.txt
/CT_EC.nict.primary.CRR.txt
/CT_EC.nict.primary.ASR.SLF.txt
/CT_CE.nict.contrastive1.CRR.txt
/CT_CE.nict.contrastive1.ASR.1.txt
/CT_EC.nict.contrastive1.CRR.txt
/CT_EC.nict.contrastive1.ASR.1.txt
/CT_CE.nict.contrastive2.CRR.txt
/CT_CE.nict.contrastive2.ASR.20.txt
/CT_EC.nict.contrastive2.CRR.txt
/CT_EC.nict.contrastive2.ASR.20.txt
- re-submitting your runs is allowed as far as the mails arrive BEFORE the submission deadline. In case that multiple TAR archives are submitted by the same participant, only the runs of the most recent submission mail will be used for the IWSLT 2009 evaluation and previous mails will be ignored.
投稿者 mpaul : 04:45| トラックバック
Automatic Evaluation Server
We prepared an online evaluation server that allows you to conduct additional experiments to confirm the effectiveness of innovative methods and features within the IWSLT 2009 evaluation framework. You can submit translation hypotyhesis files for any of the IWSLT 2009 translation tasks. The hypothesis file format is the same as for the official run submissions.
Before you can submit runs, you have to register a UserID/PassID. After login, click on "Make a new Submission", select the "Translation Direction" and "Training Data Condition" you used to generate the hypothesis file, upload the hypothesis file, specify a system ID and a short description that allows you to easily identify the run submission, and press "Calculate Scores".
The server will sequentially calculate automatic scores for BLEU/NIST, WER/PER/TER, and METEOR/F1/PREC/RECL and GTM. Finally, the automatic scoring results will be send to you via email. In addition, you can access the "Submission Log" which keeps track on all your run submissions. For details on a specific run, please click on the respective "Date". The scoring results of the "case+punc" evaluation specifications (case-sensitive, with punctuations) are displayed in bold-face and the scoring results of the "no_case+no_punc" evaluation specifications (case-insensitive, without punctuations) are displayed in brackets.
Registration
Workshop
The registration for the IWSLT 2009 workshop is now open. Please access the registration server and fill out the registration form.
Registration |
Fees |
Deadline |
Payment |
Regular |
Student |
Early |
JPY 20,000 |
JPY 15,000 |
Nov 20, 2009 |
online |
Late |
JPY 25,000 |
JPY 20,000 |
Nov 30, 2009 |
on-the-door (registration desk) |
On-site |
JPY 30,000 |
JPY 25,000 |
Dec 1 - 2, 2009 |
on-the-door (registration desk) |
The registration fees include: daily lunch, coffee breaks, USB-stick version
of the proceedings, participation in all session, and a banquet dinner on December
1. Please note, that the registration fee is not refundable under any circumstances.
Online payment is only possible until the extended Early Registration Deadline.
During the Late Registration period, you have to input the registration form online, but the payment
has to be done on the door at the workshop registration desk (7F Miraikan).
Concerning on-the-door payments, only cash payments can be accepted.
If you need a visa for coming to Japan, please contact the IWSLT Secretariat (iwslt@the-convention.co.jp)
as soon as possible, but not later than October 2nd, 2009.
If you don't know whether your need a visa or not, please check here.
投稿者 mpaul : 04:30| トラックバック
Accommodations
Hotel rates differ from day to day.
Please check-out the links below or contact the hotel directly.
投稿者 mpaul : 04:25| トラックバック
Program
December 1, 2009
09:00 |
09:30 |
workshop registration |
coffee break |
Evaluation Campaign: "Challenge Task" |
10:30 |
11:00 |
Two methods for stabilizing MERT: NICT at IWSLT 2009 |
Masao UTIYAMA, Hirofumi YAMAMOTO, Eiichiro SUMITA (NICT, Japan) |
11:00 |
11:30 |
Low-Resource Machine Translation Using MaTrEx: The DCU Machine Translation System for IWSLT 2009 |
Yanjun MA, Tsuyoshi OKITA, Özlem ÇETINOGLU, Jinhua DU, Andy WAY (Dublin City University, Ireland) |
11:30 |
12:00 |
The CASIA Statistical Machine Translation System for IWSLT 2009 |
Maoxi LI, Jiajun ZHANG , Yu ZHOU, Chengqing ZONG (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences; China) |
lunch break |
Invited Talk |
13:00 |
14:00 |
Human Translation and Machine Translation |
Philipp KOEHN (University of Edinburgh, UK) |
Technical Paper: "Oral I" |
14:00 |
14:30 |
Morphological Pre-Processing for Turkish to English Statistical Machine Translation |
Arianna BISAZZA, Marcello FEDERICO (FBK-irst, Italy) |
14:30 |
15:00 |
Enriching SCFG Rules Directly From Efficient Bilingual Chart Parsing |
Martin CMEJREK, Bowen ZHOU, Bing XIANG (IBM, USA) |
15:00 |
15:30 |
A Unified Framework for Phrase-Based, Hierarchical, and Syntax-Based Statistical Machine Translation |
Hieu HOANG, Philipp KOEHN, Adam LOPEZ (Univ. Edinburgh, UK) |
coffee break |
Evaluation Campaign: "Poster I" |
15:50 |
16:50 |
The TÜBITAK-UEKAE Statistical Machine Translation System for IWSLT 2009 |
Coskun MERMER, Hamza KAYA, Mehmet Ugur DOGAN (TÜBITAK-UEKAE, Turkey) |
15:50 |
16:50 |
The UOT System: Improve String-to-Tree Translation Using Head-Driven Phrasal Structure Grammar and Predicate-Argument Structures |
Xianchao WU, Takuya MATSUZAKI, Naoaki OKAZAKI, Yusuke MIYAO, Jun'ichi TSUJII (University of Tokyo, Japan) |
15:50 |
16:50 |
The GREYC Translation Memory for the IWSLT2009 Evaluation Campaign: one step beyond translation memory |
Yves LEPAGE, Adrien LARDILLEUX, Julien GOSME (University of Caen, France) |
15:50 |
16:50 |
The ICT Statistical Machine Translation Systems for the IWSLT 2009 |
Haitao MI, Yang LIU, Tian XIA, Xinyan XIAO, Yang FENG, Jun XIE, Hao XIONG, Zhaopeng TU, Daqi ZHENG, Yanjuan LU, Qun LIU (Institute of Computing Technology, Chinese Academy of Sciences; China) |
15:50 |
16:50 |
The University of Washington Machine Translation System for IWSLT 2009 |
Mei YANG, Amittai AXELROD, Kevin DUH, Katrin KIRCHHOFF (University of Washington, USA) |
15:50 |
16:50 |
Statistical Machine Translation adding Pattern based Machine translation in Chinese-English Translation |
Jin'ichi MURAKAMI, Masato TOKUHISA, Satoru IKEHARA (Tottori University, Japan) |
Demo Session |
16:50 |
17:20 |
Network-based Speech-to-Speech Translation |
Chiori HORI, Sakriani SAKTI, Michael PAUL, Satoshi NAKAMURA (NICT, Japan) |
Banquet |
18:00 |
20:00 |
Restaurant "LA TERRE" (Miraikan, 7F) |
December 2, 2009
09:00 |
09:30 |
workshop registration |
coffee break |
lunch break |
Invited Talk |
13:00 |
14:00 |
Monolingual Knowledge Acquisition and a Multilingual Information Environment |
Kentaro TORISAWA (NICT, Japan) |
Evaluation Campaign: "Poster II" |
14:00 |
15:00 |
AppTek Turkish-English Machine Translation System Description for IWSLT 2009 |
Selçuk KÖPRÜ (Apptek Inc., Turkey) |
14:00 |
15:00 |
LIG approach for IWSLT09 : Using Multiple Morphological Segmenters for Spoken Language Translation of Arabic |
Fethi BOUGARES, Laurent BESACIER, Hervé BLANCHON (LIG, France) |
14:00 |
15:00 |
Barcelona Media SMT system description for the IWSLT 2009: introducing source context information |
Marta R. COSTA-JUSSA, Rafael E. BANCHS (Barcelona Media, Spain) |
14:00 |
15:00 |
FBK @ IWSLT-2009 |
Nicola BERTOLDI, Arianna BISAZZA, Mauro CETTOLO, Marcello FEDERICO (FBK-irst, Italy); Germán SANCHIS-TRILLES (Universitat Politècnica de València, Spain) |
14:00 |
15:00 |
LIUM's Statistical Machine Translation Systems for IWSLT 2009 |
Holger SCHWENK, Loïc BARRAULT, Yannick ESTÈVE, Patrik LAMBER (University of le Mans, France) |
14:00 |
15:00 |
I²R's Machine Translation System for IWSLT 2009 |
Xiangyu DUAN, Deyi XIONG, Hui ZHANG, Min ZHANG, Haizhou LI (Institute for Infocomm Research, Singapore) |
coffee break |
Evaluation Campaign: "BTEC Task" |
15:20 |
15:50 |
The NUS Statistical Machine Translation System for IWSLT 2009 |
Preslav NAKOV, Chang LIU, Wei LU, Hwee Tou NG (National University of Singapore, Singapore) |
15:50 |
16:20 |
The UPV Translation System for IWSLT 2009 |
Guillem GASCÓ, Joan Andreu SÁNCHEZ (Universitat Politècnica de València, Spain) |
16:20 |
16:50 |
The MIT-LL/AFRL System for IWSLT 2009 |
Wade SHEN, Brian DELANEY, Arya Ryan AMINZADEH (MIT Lincoln Laboratory, USA); Timothy ANDERSON, Raymond SLYH (Air Force Research Laboratory, USA) |
Workshop Closing |
16:50 |
17:00 |
Closing Remarks |
Marcello FEDERICO (FBK-irst, Italy) |
投稿者 mpaul : 04:20| トラックバック
Keynote Speeches
Keynote Speech 1
Human Translation and Machine Translation |
 |
Philipp KOEHN (University of Edinburgh, UK) |
While most of recent machine translation work has focus on the gisting application (i.e., translating web pages), another important application is to aid human translators. To build better computer aided translation tools, we first need to understand how human translators work. We discuss how human translators work and what tools they typically use. We also build a novel tool that offers post-editing, interactive sentence completion, and display of translation options (online at www.caitra.org). We collected timing logs on interactions with the tool, which allows detailed analysis of translator behavior.
|
Keynote Speech 2
Two-way Speech-to-Speech Translation for Communicating Across Language Barriers |
 |
Premkumar NATARAJAN (BBN Technologies, USA) |
Two-way speech-to-speech (S2S) translation is a spoken language application that integrates multiple technologies including speech recognition, machine translation, text-to-speech synthesis, and dialog management. In recent years, research into S2S systems has resulted in several modeling techniques for improving coverage on broad domains and rapid configuration for new language pairs or domains. This talk will highlight recent advances in S2S area that range from improvements in component technologies to improvements in the end-to-end system for mobile use. I will also present metrics for evaluating the S2S technology, a methodology for determining the impact of different causes of errors, and future directions for research and development.
|
Keynote Speech 3
Monolingual Knowledge Acquisition and a Multilingual Information Environment |
 |
Kentaro TORISAWA (NICT, Japan) |
Large-scale knowledge acquisition from the Web has been a popular research topic in the last five years. This talk gives an overview of our current project aiming at the acquisition of a large scale semantic network from the Web, and in the talk I explore its possible interaction with machine translation research. Particularly, I would like to focus on two topics; multilingual corpora as source of knowledge and the applications of machine translation enabled by our technology. I will discuss a framework of bilingual co-training that gives a marked improvement in accuracy of the acquired knowledge by using two corpora written in two different languages. Also, I will show our technology can enable a new type of tasks for machine translation in Web applications.
|
投稿者 mpaul : 04:15| トラックバック
Proceedings
- Author Index -
Evaluation Campaign |
pp.1-18 |
paper |
slides |
bib |
Overview of the IWSLT 2009 Evaluation Campaign |
Michael PAUL |
pp.19-23 |
paper |
(not yet) |
bib |
apptek |
AppTek Turkish-English Machine Translation System Description for IWSLT 2009 |
Selçuk KÖPRÜ |
pp.24-28 |
paper |
poster |
bib |
bmrc |
Barcelona Media SMT system description for the IWSLT 2009: introducing source context information |
Marta R. COSTA-JUSSA, Rafael E. BANCHS |
pp.29-36 |
paper |
slides |
bib |
dcu |
Low-Resource Machine Translation Using MaTrEx: The DCU Machine Translation System for IWSLT 2009 |
Yanjun MA, Tsuyoshi OKITA, Özlem ÇETINOGLU, Jinhua DU, Andy WAY |
pp.37-44 |
paper |
poster |
bib |
fbk |
FBK @ IWSLT-2009 |
Nicola BERTOLDI, Arianna BISAZZA, Mauro CETTOLO, Marcello FEDERICO (FBK-irst, Italy); Germán SANCHIS-TRILLES (Universitat Politècnica de València, Spain) |
pp.45-49 |
paper |
poster |
bib |
greyc |
The GREYC Translation Memory for the IWSLT 2009 Evaluation Campaign: one step beyond translation memory |
Yves LEPAGE, Adrien LARDILLEUX, Julien GOSME |
pp.50-54 |
paper |
poster |
bib |
i2r |
I²R's Machine Translation System for IWSLT 2009 |
Xiangyu DUAN, Deyi XIONG, Hui ZHANG, Min ZHANG, Haizhou LI |
pp.55-59 |
paper |
poster |
bib |
ict |
The ICT Statistical Machine Translation Systems for the IWSLT 2009 |
Haitao MI, Yang LIU, Tian XIA, Xinyan XIAO, Yang FENG, Jun XIE, Hao XIONG, Zhaopeng TU, Daqi ZHENG, Yajuan LU, Qun LIU |
pp.60-64 |
paper |
poster |
bib |
lig |
LIG approach for IWSLT09 : Using Multiple Morphological Segmenters for Spoken Language Translation of Arabic |
Fethi BOUGARES, Laurent BESACIER, Hervé BLANCHON (LIG, France) |
pp.65-70 |
paper |
poster |
bib |
lium |
LIUM's Statistical Machine Translation Systems for IWSLT 2009 |
Holger SCHWENK, Loïc BARRAULT, Yannick ESTÈVE, Patrik LAMBERT |
pp.71-78 |
paper |
slides |
bib |
mit |
The MIT-LL/AFRL IWSLT-2009 System |
Wade SHEN, Brian DELANEY, Arya Ryan AMINZADEH (MIT Lincoln Laboratory, USA); Timothy ANDERSON, Raymond SLYH (Air Force Research Laboratory) |
pp.79-82 |
paper |
slides |
bib |
nict |
Two methods for stabilizing MERT: NICT at IWSLT 2009 |
Masao UTIYAMA, Hirofumi YAMAMOTO, Eiichiro SUMITA |
pp.83-90 |
paper |
slides |
bib |
nlpr |
The CASIA Statistical Machine Translation System for IWSLT 2009 |
Maoxi LI, Jiajun ZHANG, Yu ZHOU, Chengqing ZONG |
pp.91-98 |
paper |
slides |
bib |
nus |
The NUS Statistical Machine Translation System for IWSLT 2009 |
Preslav NAKOV, Chang LIU, Wei LU, Hwee Tou NG |
pp.99-106 |
paper |
poster |
bib |
tokyo |
The UOT System: Improve String-to-Tree Translation Using Head-Driven Phrasal Structure Grammar and Predicate-Argument Structures |
Xianchao WU, Takuya MATSUZAKI, Naoaki OKAZAKI, Yusuke MIYAO, Jun'ichi TSUJII |
pp.107-112 |
paper |
poster |
bib |
tottori |
Statistical Machine Translation adding Pattern-based Machine Translation in Chinese-English Translation |
Jin'ichi MURAKAMI, Masato TOKUHISA, Satoru IKEHARA |
pp.113-117 |
paper |
poster |
bib |
tubitak |
The TÜBITAK-UEKAE Statistical Machine Translation System for IWSLT 2009 |
Coskun MERMER, Hamza KAYA, Mehmet Ugur DOGAN |
pp.118-123 |
paper |
slides |
bib |
upv |
UPV Translation System for IWSLT 2009 |
Guillem GASCÓ, Joan Andreu SÁNCHEZ |
pp.124-128 |
paper |
poster |
bib |
uw |
The University of Washington Machine Translation System for IWSLT 2009 |
Mei YANG, Amittai AXELROD, Kevin DUH, Katrin KIRCHHOFF |
Technical Paper |
pp.129-135 |
paper |
slides |
bib |
Morphological Pre-Processing for Turkish to English Statistical Machine Translation |
Arianna BISAZZA, Marcello FEDERICO |
pp.136-143 |
paper |
slides |
bib |
Enriching SCFG Rules Directly From Efficient Bilingual Chart Parsing |
Martin CMEJREK, Bowen ZHOU, Bing XIANG |
pp.144-151 |
paper |
slides |
bib |
Structural Support Vector Machines for Log-Linear Approach in Statistical Machine Translation |
Katsuhiko HAYASHI, Taro WATANABE, Hajime TSUKADA, Hideki ISOZAKI |
pp.152-159 |
paper |
slides |
bib |
A Unified Framework for Phrase-Based, Hierarchical, and Syntax-Based Statistical Machine Translation |
Hieu HOANG, Philipp KOEHN, Adam LOPEZ |
pp.160-167 |
paper |
slides |
bib |
Online Language Model Adaptation for Spoken Dialog Translation |
German SANCHIS-TRILLES, Mauro CETTOLO, Nicola BERTOLDI, Marcello FEDERICO |
Demo |
pp.168-168 |
paper |
slides |
bib |
Network-based Speech-to-Speech Translation |
Chiori HORI, Sakriani SAKTI, Michael PAUL, Noriyuki KIMURA, Yutaka ASHIKARI, Ryosuke ISOTANI, Eiichiro SUMITA, Satoshi NAKAMURA (NICT, Japan) |
Keynote Speech |
- |
abstract |
slides |
- |
Human Translation and Machine Translation |
Philipp KOEHN (University of Edinburgh, UK) |
- |
abstract |
(not yet) |
- |
Two-way Speech-to-Speech Translation for Communicating Across Language Barriers |
Premkumar NATARAJAN (BBN Technologies, USA) |
- |
abstract |
slides |
- |
Monolingual Knowledge Acquisition and a Multilingual Information Environment |
Kentaro TORISAWA (NICT, Japan) |
投稿者 mpaul : 04:12| トラックバック
Author Index
A-B-C-D-E-F-G-H-I-K-L-M-N-O-P-S-T-U-W-X-Y-Z
A |
AMINZADEH, Arya Ryan |
71
|
ANDERSON, Timothy |
71
|
ASHIKARI, Yutaka |
168
|
AXELROD, Amittai |
124
|
B |
BANCHS, Rafael E. |
24
|
BARRAULT, Loïc |
65
|
BERTOLDI, Nicola |
37,
160
|
BESACIER, Laurent |
60
|
BISAZZA, Arianna |
37,
129
|
BLANCHON, Hervé |
60
|
BOUGARES, Fethi |
60
|
C |
ÇETINOGLU, Özlem |
29
|
CETTOLO, Mauro |
37,
160
|
CMEJREK, Martin |
136
|
COSTA-JUSSÀ, Marta R. |
24
|
D |
DELANEY, Brian |
71
|
DOGAN, Mehmet Ugur |
113
|
DU, Jinhua |
29
|
DUAN, Xiangyu |
50
|
DUH, Kevin |
124
|
E |
ESTÈVE, Yannick |
65
|
F |
FEDERICO, Marcello |
37,
129,
160
|
FENG, Yang |
55
|
G |
GASCÓ, Guillem |
118
|
GOSME, Julian |
45
|
H |
HAYASHI, Katsuhiko |
144
|
HOANG, Hieu |
152
|
HORI, Chiori |
168
|
I |
IKEHARA, Satoru |
107
|
ISOTANI, Ryosuke |
168
|
ISOZAKI, Hideki |
144
|
K |
KAYA, Hamza |
113
|
KIMURA, Noriyuki |
168
|
KIRCHHOFF, Katrin |
124
|
KOEHN, Philipp |
152
|
KÖPRÜ, Selçuk |
19
|
L |
LAMBERT, Patrik |
65
|
LARDILLEUX, Adrien |
45
|
LEPAGE, Yves |
45
|
LI, Haizhou |
50
|
LI, Maoxi |
83
|
LIU, Chang |
91
|
LIU, Qun |
55
|
LIU, Yang |
55
|
LOPEZ, Adam |
152
|
LU, Wei |
91
|
LU, Yajuan |
55
|
M |
MA, Yanjun |
29
|
MATSUZAKI, Takuya |
99
|
MERMER, Coskun |
113
|
MI, Haitao |
55
|
MIYAO, Yusuke |
99
|
MURAKAMI, Jin'ichi |
107
|
N |
NAKAMURA, Satoshi |
168
|
NAKOV, Preslav |
91
|
NG, Hwee Tou |
91
|
O |
OKAZAKI, Naoaki |
99
|
OKITA, Tsuyoshi |
29
|
P |
PAUL, Michael |
1,
168
|
S |
SAKTI, Sakriani |
168
|
SÁNCHEZ, Joan Andreu |
118
|
SANCHIS-TRILLES, Germán |
37,
160
|
SCHWENK, Holger |
65
|
SHEN, Wade |
71
|
SLYH, Raymond |
71
|
SUMITA, Eiichiro |
79,
168
|
T |
TOKUHISA, Masato |
107
|
TSUJII, Jun'ichi |
99
|
TSUKADA, Hajime |
144
|
TU, Zhaopeng |
55
|
U |
UTIYAMA, Masao |
79
|
W |
WATANABE, Taro |
144
|
WAY, Andy |
29
|
WU, Xianchao |
99
|
X |
XIA, Tian |
55
|
XIANG, Bing |
136
|
XIAO, Xinyan |
55
|
XIE, Jun |
55
|
XIONG, Deyi |
50
|
XIONG, Hao |
55
|
Y |
YAMAMOTO, Hirofumi |
79
|
YANG, Mei |
124
|
Z |
ZHANG, Hui |
50
|
ZHANG, Jiajun |
83
|
ZHANG, Min |
50
|
ZHENG, Daqi |
55
|
ZHOU, Bowen |
136
|
ZHOU, Yu |
83
|
ZONG, Chengqing |
83
|
投稿者 mpaul : 04:10| トラックバック
Bibliography
@inproceedings{iwslt09:EC:overview, |
author= |
{Michael Paul}, |
title= |
{{Overview of the IWSLT 2009 Evaluation Campaign}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{1-18}, |
} |
|
|
|
@inproceedings{iwslt09:EC:apptek, |
author= |
{Sel\c{o}uk K\"{o}pr\"{u}}, |
title= |
{{AppTek Turkish-English Machine Translation System Description for IWSLT 2009}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{19-23}, |
} |
|
|
|
@inproceedings{iwslt09:EC:bmrc, |
author= |
{Marta R. Costa-Juss\`{a} and Rafael E. Banchs}, |
title= |
{{Barcelona Media SMT system description for the IWSLT 2009: introducing source context information}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{24-28}, |
} |
|
|
|
@inproceedings{iwslt09:EC:dcu, |
author= |
{Yanjun Ma and Tsuyoshi Okita and \"{O}zlem \c{C}etino\u{g}lu and Jinhua Du and Andy Way}, |
title= |
{{Low-Resource Machine Translation Using MaTrEx: The DCU Machine Translation System for IWSLT 2009}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{29-36}, |
} |
|
|
|
@inproceedings{iwslt09:EC:fbk, |
author= |
{Nicola Bertoldi and Arianna Bisazza and Mauro Cettolo and Germ\'{a};n Sanchis-Trilles and Marcello Federico}, |
title= |
{{FBK @ IWSLT-2009}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{37-44}, |
} |
|
|
|
@inproceedings{iwslt09:EC:greyc, |
author= |
{Yves Lepage and Adrien Lardilleux and Julien Gosme}, |
title= |
{{The GREYC Translation Memory for the IWSLT 2009 Evaluation Campaign: one step beyond translation memory}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{45-49}, |
} |
|
|
|
@inproceedings{iwslt09:EC:i2r, |
author= |
{Xiangyu Duan and Deyi Xiong and Hui Zhang and Min Zhang and Haizhou Li}, |
title= |
{{I${}^{2}$R's Machine Translation System for IWSLT 2009}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{50-54}, |
} |
|
|
|
@inproceedings{iwslt09:EC:ict, |
author= |
{Haitao Mi and Yang Liu and Tian Xia and Xinyan Xiao and Yang Feng and Jun Xie and Hao Xiong and Zhaopeng Tu and Daqi Zheng and Yajuan Lu and Qun Liu}, |
title= |
{{The ICT Statistical Machine Translation Systems for the IWSLT 2009}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{55-59}, |
} |
|
|
|
@inproceedings{iwslt09:EC:lig, |
author= |
{Fethi Bougares and Laurent Besacier and Herv\'{e} Blanchon}, |
title= |
{{LIG approach for IWSLT09 : Using Multiple Morphological Segmenters for Spoken Language Translation of Arabic}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{60-64}, |
} |
|
|
|
@inproceedings{iwslt09:EC:lium, |
author= |
{Holger Schwenk and Lo\"{i}c Barrault and Yannick Est\`{e}ve and Patrik Lambert}, |
title= |
{{LIUM's Statistical Machine Translation Systems for IWSLT 2009}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{65-70}, |
} |
|
|
|
@inproceedings{iwslt09:EC:mit, |
author= |
{Wade Shen and Brian Delaney and Arya Ryan Aminzadeh and Timothy Anderson and Raymond Slyh}, |
title= |
{{The MIT-LL/AFRL IWSLT-2009 System}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{71-78}, |
} |
|
|
|
@inproceedings{iwslt09:EC:nict, |
author= |
{Masao Utiyama and Hirofumi Yamamoto and Eiichiro Sumita}, |
title= |
{{Two methods for stabilizing MERT: NICT at IWSLT 2009}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{79-82}, |
} |
|
|
|
@inproceedings{iwslt09:EC:nlpr, |
author= |
{Maoxi Li and Jiajun Zhang and Yu Zhou and Chengqing Zong}, |
title= |
{{The CASIA Statistical Machine Translation System for IWSLT 2009}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{83-90}, |
} |
|
|
|
@inproceedings{iwslt09:EC:nus, |
author= |
{Preslav Nakov and Chang Liu and Wei Lu and Hwee Tou Ng}, |
title= |
{{The NUS Statistical Machine Translation System for IWSLT 2009}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{91-98}, |
} |
|
|
|
@inproceedings{iwslt09:EC:tokyo, |
author= |
{Xianchao Wu and Takuya Matsuzaki and Naoaki Okazaki and Yusuke Miyao and Jun'ichi Tsujii}, |
title= |
{{The UOT System: Improve String-to-Tree Translation Using Head-Driven Phrasal Structure Grammar and Predicate-Argument Structures}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{99-106}, |
} |
|
|
|
@inproceedings{iwslt09:EC:tottori, |
author= |
{Jin'ichi Murakami and Masato Tokuhisa and Satoru Ikehara}, |
title= |
{{Statistical Machine Translation adding Pattern-based Machine translation in Chinese-English Translation}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{107-112}, |
} |
|
|
|
@inproceedings{iwslt09:EC:tubitak, |
author= |
{{Co\c{s}kun} Mermer and Hamza Kaya and Mehmet U\v{g}ur Do\v{g}an}, |
title= |
{{The T\"{U}BITAK-UEKAE Statistical Machine Translation System for IWSLT 2009}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{113-117}, |
} |
|
|
|
@inproceedings{iwslt09:EC:upv, |
author= |
{Guillem Gasc\'{o} and Joan Andreu S\'{a}nchez}, |
title= |
{{UPV Translation System for IWSLT 2009}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{118-123}, |
} |
|
|
|
@inproceedings{iwslt09:EC:uw, |
author= |
{Mei Yang and Amittai Axelrod and Kevin Duh and Katrin Kirchhoff}, |
title= |
{{The University of Washington Machine Translation System for IWSLT 2009}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{124-128}, |
} |
|
|
|
@inproceedings{iwslt09:TP:bisazza, |
author= |
{Arianna Bisazza and Marcello Federico}, |
title= |
{{Morphological Pre-Processing for Turkish to English Statistical Machine Translation}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{129-135}, |
} |
|
|
|
@inproceedings{iwslt09:TP:cmejrek, |
author= |
{Martin Cmejrek and Bowen Zhou and Bing Xiang}, |
title= |
{{Enriching SCFG Rules Directly From Efficient Bilingual Chart Parsing}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{136-143}, |
} |
|
|
|
@inproceedings{iwslt09:TP:hayashi, |
author= |
{Katsuhiko Hayashi and Taro Watanabe and Hajime Tsukada and Hideki Isozaki}, |
title= |
{{Structural Support Vector Machines for Log-Linear Approach in Statistical Machine Translation}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{144-151}, |
} |
|
|
|
@inproceedings{iwslt09:TP:hoang, |
author= |
{Hieu Hoang and Philipp Koehn and Adam Lopez}, |
title= |
{{A Unified Framework for Phrase-Based, Hierarchical, and Syntax-Based Statistical Machine Translation}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{152-159}, |
} |
|
|
|
@inproceedings{iwslt09:TP:sanchis, |
author= |
{Germ\'{a}n Sanchis-Trilles and Mauro Cettolo and Nicola Bertoldi and Marcello Federico}, |
title= |
{{Online Language Model Adaptation for Spoken Dialog Translation}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{160-167}, |
} |
|
|
|
@inproceedings{iwslt09:DEMO:nict, |
author= |
{Chiori Hori and Sakriani Sakti and Michael Paul and Noriyuki Kimura and Yutaka Ashikari and Ryosuke Isotani and Eiichiro Sumita and Satoshi Nakamura}, |
title= |
{{Network-based Speech-to-Speech Translation}}, |
year= |
{2009}, |
booktitle= |
{Proc. of the International Workshop on Spoken Language Translation}, |
address= |
{Tokyo, Japan}, |
pages= |
{168}, |
} |
|
|
投稿者 mpaul : 04:08| トラックバック
Venue
投稿者 mpaul : 04:00| トラックバック
Gallery
IWSLT 2009 December 1-2, 2009 National Museum of Emerging Science and Innovation Tokyo, Japan
|
投稿者 mpaul : 03:30| トラックバック
Organizers
Organizers
- Alex Waibel (CMU, USA / UKA, Germany)
- Marcello Federico (FBK, Italy)
- Satoshi Nakamura (NICT, Japan)
Chairs
- Eiichiro Sumita (NICT, Japan; Workshop)
- Michael Paul (NICT, Japan; Evaluation Campaign)
- Marcello Federico (FBK, Italy; Technical Paper)
Program Committee
- Laurent Besacier (LIG, France)
- Francisco Casacuberta (ITI-UPV, Spain)
- Boxing Chen (NRC, Canda)
- Philipp Koehn (Univ. Edinburgh, UK)
- Philippe Langlais (Univ. Montreal, Canada)
- Geunbae Lee (Postech, Korea)
- Yves Lepage (GREYC, France)
- Haizhou Li (I2R, Singapore)
- Qun Liu (ICT, China)
- José B. Mariño (TALP-UPC, Spain)
- Coskun Mermer (TUBITAK, Turkey)
- Christof Monz (QMUL, UK)
- Hermann Ney (RWTH, Germany)
- Holger Schwenk (LIUM, France)
- Wade Shen (MIT-LL, USA)
- Hajime Tsukada (NTT, Japan)
- Haifeng Wang (TOSHIBA, China)
- Andy Way (DCU, Ireland)
- Chengqing Zong (CASIA, China)
Local Arrangements
Supporting Organizations
投稿者 mpaul : 03:00| トラックバック
Contact
WORKSHOP ORGANIZATION
Eiichiro Sumita
(reverse) email: jp *dot* co *dot* nict *at* sumita *dot* eiichiro
EVALUATION CAMPAIGN
Michael Paul
(reverse) email: jp *dot* go *dot* nict *at* paul *dot* michael
TECHNICAL PAPER
Marcello Federico
(reverse) email: eu *dot* fbk *at* federico
LOCAL ARRANGEMENT
Mari Oku
(reverse) email: jp *dot* go *dot* nict *dot* khn *at* iwsltlocal09
National Institute of Information and Communications Technology (NICT)
Knowledge Creating Communciation Research Center
MASTAR Project
2-2-2 Hikaridai, Keihanna Science City, Kyoto 619-0288, Japan
TEL: +81-774-95-1301
FAX: +81-774-95-1308
投稿者 mpaul : 02:00| トラックバック
References
Events Co-located with IWSLT 2009
IWSLT Evaluation Campaigns
投稿者 mpaul : 01:00| トラックバック