IWSLT2008: Evaluation Server

Theme - Evaluation Campaign - Important Dates - Downloads - Resources - Submission - Evaluation Server - Registration - Accomodation - Program - Keynote Speech - Proceedings - Author Index - Bibliography - Venue - Gallery - Organizers - Contact - References -

Evaluation Server

We prepared an online evaluation server that allows you to conduct additional experiments to confirm the effectiveness of innovative methods and features within the IWSLT 2008 evaluation framework. You can submit translation hypotyhesis files for each data track. The hypothesis file format is the same as for the official run submissions.

Before you can submit runs, you have to register a UserID/PassID. After login, select the "Translation Direction" and "Training Data Condition" you used to generate the hypothesis file, specify a system ID and a short description that allows you to easily identify the run submission, and press "Calculate Scores".

The server will iteratively calculate automatic scores for BLEU/NIST, WER/PER, and METEOR/GTM. Finally, the automatic scoring results will be send to you via email. In addition, you can access the "Submission Log" which keeps track on all your run submissions. The scoring results of the official evaluation specifications (case-sensitive, with punctuations) are displayed in bold-face and the scoring results of the additional evaluation specifications (case-insensitive, without punctuations) are displayed in parantheses. Concerning BLEU/NIST, confidence intervals using a bootstrap method (1000 iterations) are also calculated. The average of BLEU and METEOR score ((B+M)/2) will be used for ranking the MT systems.