Before you can submit runs, you have to register a UserID/PassID. After login, select the "Translation Direction" and "Training Data Condition" you used to generate the hypothesis file, specify a system ID and a short description that allows you to easily identify the run submission, and press "Calculate Scores".
The server will iteratively calculate automatic scores for BLEU/NIST, WER/PER, and METEOR/GTM. Finally, the automatic scoring results will be send to you via email. In addition, you can access the "Submission Log" which keeps track on all your run submissions. The scoring results of the official evaluation specifications (case-sensitive, with punctuations) are displayed in bold-face and the scoring results of the additional evaluation specifications (case-insensitive, without punctuations) are displayed in parantheses. Concerning BLEU/NIST, confidence intervals using a bootstrap method (1000 iterations) are also calculated. The average of BLEU and METEOR score ((B+M)/2) will be used for ranking the MT systems.